# Install packages
install.packages("broom")
install.packages("ggplot2")
install.packages("glmnet")
install.packages("gt")
install.packages("nhanesA")
install.packages("randomForest")
install.packages("scales")
install.packages("survey")
install.packages("tidyverse")
install.packages("tidymodels")
install.packages("xgboost")
install.packages("vip")
# Install dietaryindex packages
devtools::install_github("zxucmu/dietaryindexNDP", ref = "dietaryindexNDP_1.0.0")Dietary Determinants of Allergic, Immunologic, and Cardiometabolic Diseases
BMIN5030/EPID600 Final Project
1 Overview
This project used data from the nationally representative National Health and Nutrition Examination Survey (NHANES) to explore links between dietary determinants and allergic, immunologic, and cardiometabolic outcomes in U.S. adults, focusing on those between the ages of 20 and 40. Using survey-weighted univariable and multivariable regression analyses, I examined whether diet quality - primarily assessed via the Dietary Inflammatory Index - and various sociodemographic (e.g., age, sex, race/ethnicity, poverty-income ratio), behavioral (e.g., alcohol, smoking), and clinical (e.g., body mass index [BMI], blood pressure, hemoglobin A1c [HbA1c], vitamin D level) features were associated with self-reported asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, and hypertension. In addition, I used exploratory machine learning models to identify predictors that may contribute to risk across these diseases.
I discussed this project with Dr. Robert Grundmeier (Professor of Pediatrics at CHOP and Penn, Department of Biomedical and Health Informatics) and Dr. Rich Tsui (Professor of Anesthesiology and Critical Care at CHOP and Penn, Department of Anesthesiology and Critical Care Medicine). With Dr. Grundmeier, I discussed cohort definitions, application of survey weights, and approach for regression analyses. With both Dr. Grundmeier and Dr. Tsui, I discussed the approach for machine learning analyses.
GitHub repository link: Final Project Repo
2 Introduction
Over the past several decades, rates of allergic conditions – including asthma, allergic rhinitis, food allergy, and eczema - have risen substantially worldwide, driven by mechanisms that remain only partially understood [1]. A concurrent increase has occurred in cardiometabolic diseases, such as obesity, hypertension, and diabetes [2]. Allergic, immunologic, and cardiometabolic conditions share comorbidity. Together, they reflect a state of immune dysregulation that occurs due to a complex interplay among genetic susceptibility, environmental exposures, and immune-metabolic pathways. There is growing recognition that dietary factors influence immune development beginning early in life and continuing into adulthood. Diet quality, nutrient intake patterns, and anthropometric factors such as adiposity influence inflammation, epithelial barrier function, and the gut microbiome, which in turn affect disease susceptibility [3,4]. While a modulatory role for diet has been been implicated in select allergic (e.g., asthma) and cardiometabolic (e.g., obesity) conditions [5,6], knowledge gaps remain about the potential modulatory role in other allergic and autoimmune diseases. This knowledge could help guide how dietary approaches might be integrated into clinical care to prevent or manage these chronic conditions.
NHANES is a continuous, nationally representative survey that integrates interviews, physical examinations, and laboratory assessments to characterize the health and nutritional status of the U.S. population. It provides detailed information on demographics, health behaviors, dietary recalls, biomarkers, anthropometrics, and clinical outcomes, making it well suited for population-level nutrition research [7]. The Dietary Inflammatory Index (DII) is an established, literature-derived score that quantifies the inflammatory potential of an individual’s diet based on intake of nutrients and food components known to influence systemic inflammation. To compute standardized DII values in NHANES, this study used the dietaryindexNDP package, which links NHANES dietary recall data with nutrient composition information from the U.S. Department of Agriculture’s Food and Nutrient Database for Dietary Studies (FNDDS) and food group equivalents from the Food Patterns Equivalents Database (FPED) [8,9]. Leveraging these tools, I examined associations between dietary, sociodemographic, and clinical factors and a set of allergic-immunologic outcomes (asthma, allergic rhinitis, inflammatory arthritis, eczema, and food allergy) as well as cardiometabolic outcomes (diabetes and hypertension). I specifically focused on adults aged 20-40 years, a sub-population that may be particularly amenable to dietary interventions aimed at disease prevention or management. In this study, I applied survey-weighted logistic regression along with exploratory machine learning models to assess whether dietary factors and related covariates are associated with risk across these various immune dysregulatory conditions.
This work is inherently multidisciplinary, bridging allergy–immunology, nutrition, epidemiology, and biomedical informatics. Informatics approaches were essential to this project, from extracting and integrating NHANES data to applying the survey design structure and implementing the regression and machine learning analyses. The integration of my prior training in allergy-immunology, clinical nutrition, epidemiology, and statistics with my current training in informatics uniquely equipped me to complete this project and interpret its findings with a cross-disciplinary lens. This project was supported by mentorship from Drs. Robert Grundmeier and Rich Tsui, whose guidance was instrumental in developing cohort definitions, applying survey-weighted design, and shaping the analytic approach for both the regression and machine learning analyses.
3 Methods
Data source, cohort extraction, and data harmonization
Using NHANES, I performed a cross-sectional analysis focused on non-pregnant adults ages 20-40 years from 2005 to 2012. This age and time range were selected to target a younger adult population that may be more amenable to dietary interventions for prevention or management of chronic diseases. Additional reasons for this sub-population choice included (1) avoidance of incorporating the increasing medical complexity associated with older age, and (2) partial mitigation of missingness of key variables that were less available for other age groups across NHANES cycles. I pooled four continuous two-year NHANES cycles (2005-2006, 2007-2008, 2009-2010, and 2011-2012). For the main analyses (asthma, allergic rhinitis, inflammatory arthritis, diabetes, hypertension), data from the full pooled period (2005–2012) were used. Because eczema and food allergy were only measured in select cycles, I additionally constructed time-restricted subcohorts for eczema (2005-2006) and food allergy (2007-2010).
Covariates included sociodemographic variables (e.g., age, sex, race/ethnicity), clinical characteristics (e.g., BMI, waist-to-height ratio, vitamin D [25(OH)D] level), and self-reported allergic-immunologic conditions (asthma, allergic rhinitis, inflammatory arthritis [i.e., rheumatoid or psoriatic arthritis], eczema, food allergy) and cardiometabolic diseases (diabetes, hypertension). Two-year survey weights were rescaled to generate weights for analysis of pooled NHANES cycles. Data from multiple NHANES modules were merged into a single dataset, and variables were harmonized by recoding them into consistent categories across cycles. Using the dietaryindexNDP package, the Dietary Inflammatory Index (DII) was computed based on 24-hour dietary recall data and was represented as both as a continuous score (with and without alcohol intake included) and as quartiles (Q1–Q4), corresponding to most anti-inflammatory to most pro-inflammatory, as previously described [10]. To reduce the influence of extreme values and to create clinically meaningful comparison groups, select variables (e.g., BMI, vitamin D levels) were converted into quartiles prior to incorporation into models. I performed data manipulation with the tidyverse set of packages, formatted outputs with broom and scales packages, generated figures using the ggplot2 package, and generated tables using the gt package.
All extracted variables, their types (i.e., continuous, categorical, binary), distributions, and missingness are summarized in Table 1. In general, for sociodemographic and behavioral variables, missingness was low, with most variables (e.g., age, sex, education, insurance, smoking, physical activity, family history) exhibiting a missingness rate of less than 2%, while other variables (e.g. race/ethnicity, alcohol use, poverty-income ratio, food security) showed a range of missingness rates between 7% and 20%. For cardiometabolic and nutrient labs, the missingness rate was 8%-10%. The rate of missingness for primary clinical outcomes was low (less than 1%). For food allergy and eczema, the missingness (calculated relative to the full cohort) was expected given the unavailability of these clinical outcomes in many survey cycles. Hence, time-restricted comparator cohorts were used for secondary analyses for these specific clinical outcomes. Subsequent regression and machine learning analyses used complete-case data.
Weighted prevalence analysis
Using the survey package, the survey-weighted prevalence of all allergic, immunologic, and cardiometabolic conditions was determined. In addition, to assess how prevalence varied by dietary inflammatory status, survey-weighted prevalence was determined within DII quartiles (Q1-Q4). Statistical significance was determined in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2-Q4 with Q1 using Wald tests.
Univariable and multivariable regression analyses
Survey-weighted regression analyses were performed using the svyglm function in the survey package. After assessing models for overdispersion, quasibinomial survey-weighted logistic regression was employed. First, univariable analyses were performed to examine associations between each outcome and a core set of sociodemographic, behavioral, clinical, and dietary predictors. An additional exploratory set of univariable analyses evaluated associations between each outcome and individual DII nutrient components. For all univariable models, odds ratios (ORs) with 95% confidence intervals (CIs) were reported, and p values were adjusted using the Benjamini-Hochberg method to account for multiple comparisons.
Next, survey-weighted multivariable regression analyses were performed to evaluate adjusted associations between continuous DII and each disease outcome. Three models were constructed. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI category, physical activity, and smoking status. For each model, the OR, 95% CIs and p values were determined, and results were visualized using Forest plots.
Effect modification analyses
I assessed whether the association between continuous DII and each disease outcome differed by sex, BMI category, or vitamin D category using survey-weighted logistic regression models that included all Model 3 covariates from the multivariable regression analyses. Sex contributed one interaction term (DII x sex), whereas the multi-category BMI and vitamin D variables contributed multiple interaction terms with DII. I determined stratum-specific ORs and 95% CIs for DII within each modifier level. Statistical significance of effect modification was evaluated using Wald tests combining all interaction coefficients for each modifier.
Machine learning analyses
I developed exploratory machine learning models to predict disease outcomes (asthma, allergic rhinitis, inflammatory arthritis, hypertension, diabetes, eczema, and food allergy) using the tidymodels set of packages for preprocessing, data sampling, model tuning, and performance evaluation. Two predictor sets were examined. Set 1 included core sociodemographic, behavioral, anthropometric, and clinical variables as well as continuous DII scores (23 variables). Set 2 included individual nutrient predictors (26 variables). Four algorithms were trained for each outcome, namely logistic regression (LR), LASSO-penalized logistic regression (glmnet package), Random Forest (RF; randomForest package), and XGBoost (XGB; xgboost package). Logistic regression was chosen as a baseline comparator model, LASSO was chosen for its ability to shrink coefficients and perform variable selection, and RF and XGB were chosen due to their ability to capture complex non-linear relationships.
For each outcome and predictor set, data were split into a training set (80%) and a test set (20%). Within the training set, 10-fold cross-validation was used for preprocessing and hyperparameter tuning. Preprocessing steps were implemented using tidymodels recipes. Hyperparameters for LASSO (penalty), RF (mtry), and XGB (number of trees, tree depth, learning rate) were tuned using the cross-validated area under the receiver operating characteristic curve (AUROC). Model performance on the test set was quantified using AUROC. To identify the topmost predictors, I computed model-specific variable importance scores using the vip package. For LR models, variable importance referred to the absolute value of standardized regression coefficients. For LASSO models, variable importance was based on the absolute value of the penalized coefficients after shrinkage. For RF models, variable importance was derived from tree-based split metrics (i.e., increase in node purity associated with each variable). Finally, for XGB, variable importance reflected the gain (i.e., how much each predictor improved the model’s accuracy when used in boosted trees).
Code to install and load packages
# Load packages
library(broom)
library(dietaryindexNDP)
library(ggplot2)
library(glmnet)
library(gt)
library(nhanesA)
library(randomForest)
library(scales)
library(survey)
library(tidymodels)
library(tidyverse)
library(xgboost)
library(vip)Code to generate master analytic dataset and summarize variable types, distributions, and missingness
The code below was used to generate the master analytic dataset for subsequent regression and machine learning analyses. Briefly, custom helper functions were used with the nhanesA package to streamline downloading NHANES modules of interest (i.e., demographics, psychosocial variables, medical conditions, anthropometrics, laboratory results, dietary variables, etc.) across four survey cycles (2005–2012). These were useful for exploring survey-cycle and variable combinations while optimizing the cohort parameters for the final cohort selection. The set of tidyverse packages was used to clean and merge data. I then used the dietaryindexNDP package to compute Dietary Inflammatory Index (DII) scores (with and without alcohol) and individual nutrient components. All cleaned modules were joined by the participant ID into a single dataset. I then harmonized disease outcomes (i.e., asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, hypertension) and converted selected variables (e.g., BMI, vitamin D level) into clinically relevant categories. The cohort was restricted to non-pregnant adults 20-40 years of age, and NHANES sampling weights were rescaled to account for pooling of survey cycles. Using the survey package, I generated outcome-specific analytic subsets with corresponding mobile examination center (MEC)-weighted survey designs for subsequent regression analyses. A data dictionary table (Table 1) summarizes all variables, their types, sample sizes, value ranges, categorical levels, and missingness. Additional coding details are included as annotations within the code chunk below.
# Helper functions-----------------------------------------------------------------------------
## Helper function to load NHANES module across survey cycles, including options to restrict to select variables and provide status summaries
load_nhanes_module <- function(module, cycles_vec = cycles,
keep_vars = NULL,
quiet = TRUE) { # set to FALSE for debugging messages
module_years <- paste0(module, "_", cycles_vec)
status <- tibble(
file_tag = module_years,
ok = FALSE,
n_rows = NA_integer_,
note = NA_character_
)
dfs <- vector("list", length(module_years))
for (i in seq_along(module_years)) {
x <- module_years[i]
out <- tryCatch(
{
if (!quiet) {
message("Attempting to download: ", x)
}
df <- nhanes(x) |> as.data.frame()
if (!is.null(keep_vars)) {
df <- df |> select(any_of(keep_vars))
}
n <- nrow(df)
status$n_rows[i] <- n
if (n == 0) {
status$ok[i] <- FALSE
status$note[i] <- "Downloaded, 0 rows"
if (!quiet) {
warning("File ", x, " downloaded but has 0 rows.")
}
df <- NULL
} else {
status$ok[i] <- TRUE
status$note[i] <- "Loaded"
df$file_tag <- x
}
df
},
error = function(e) {
status$ok[i] <- FALSE
status$note[i] <- paste("ERROR:", conditionMessage(e))
if (!quiet) {
warning("File ", x, " could not be downloaded: ", conditionMessage(e))
}
return(NULL)
}
)
dfs[[i]] <- out
}
dfs_nonempty <- dfs[!vapply(dfs, is.null, logical(1))]
if (!quiet) {
message("\nSummary for module ", module, ":")
print(status)
}
if (length(dfs_nonempty) == 0) {
stop("No cycles successfully downloaded for module: ", module)
}
out <- bind_rows(dfs_nonempty)
### Missing variables filled with NA
if (!is.null(keep_vars)) {
missing_vars <- setdiff(keep_vars, names(out))
if (length(missing_vars) > 0) {
for (v in missing_vars) {
out[[v]] <- NA_real_
}
}
}
out
}
## Helper function to convert yes/no (1/2) variables into 0/1
to01 <- function(x) {
x <- as.character(x)
case_when(
x %in% c("1", "Yes", "YES", "yes") ~ 1,
x %in% c("2", "No", "NO", "no") ~ 0,
TRUE ~ NA_real_
)
}
## Helper function to convert survey cycle numbers to cycle labels
cycle_label <- function(s) {
years <- str_extract(as.character(s), "\\d{4}-\\d{4}")
case_when(
!is.na(years) ~ years,
TRUE ~ recode(
as.numeric(s),
`4` = "2005-2006",
`5` = "2007-2008",
`6` = "2009-2010",
`7` = "2011-2012",
.default = NA_character_
)
)
}
## Helper function to summarize variable availability and missingness by cycle
check_var_availability <- function(data, vars = NULL) {
if (is.null(vars)) {
vars <- setdiff(
names(data),
c(
"seqn", "wtint2yr", "wtmec2yr", "wtint8yr", "wtmec8yr",
"sdmvpsu", "sdmvstra", "sddsrvyr", "cycle_str"
)
)
}
map_dfr(vars, function(v) {
data |>
group_by(cycle = sddsrvyr) |>
summarise(
variable = v,
total = n(),
n_nonmissing = sum(!is.na(.data[[v]])),
n_missing = sum(is.na(.data[[v]])),
prop_missing = mean(is.na(.data[[v]])),
.groups = "drop"
)
}) |>
arrange(variable, cycle)
}
## Restrict analysis NHANES cycles between 2005 and 2012 (i.e., D–G)
cycles <- c("D", "E", "F", "G")
# Load NHANES modules--------------------------------------------------------------------------
## Load demographic variables within DEMO module (i.e., demographics, survey design variables)
demo_keep <- c(
"SEQN", # Participant ID
"RIDAGEYR", # Age
"RIAGENDR", # Sex (1 = Male, 2 = Female)
"RIDRETH1", # Race/ethnicity (5-level)
"DMDEDUC2", # Education level
"INDFMPIR", # Poverty-income ratio
"SDDSRVYR", # Survey cycle indicator
"SDMVPSU", # Primary sampling unit (PSU) for survey design
"SDMVSTRA", # Stratification variable for survey design
"WTINT2YR", # 2-year interview weight
"WTMEC2YR", # 2-year MEC exam weight
"RIDEXPRG" # Pregnancy status for females
)
demo <- load_nhanes_module("DEMO", keep_vars = demo_keep)
## Load medical variables within MCQ module (i.e., asthma, family history of asthma, arthritis)
mcq_keep <- c(
"SEQN",
"MCQ010", # Asthma
"MCQ300B", # Family history asthma
"MCQ160A", # Arthritis yes/no
"MCQ190", # Arthritis type (2005–08)
"MCQ191", # Arthritis type (2009–10)
"MCQ195" # Arthritis type (2011–12)
)
mcq <- load_nhanes_module("MCQ", keep_vars = mcq_keep)
## Load eczema and allergic rhinitis data within AGQ module for 2005–2006
agq_d <- load_nhanes_module(
"AGQ",
cycles_vec = "D",
keep_vars = c("SEQN","AGQ010","AGQ180")
)
## Load allergic rhinitis data within RDQ module for 2007–2012
rdq <- load_nhanes_module(
"RDQ",
cycles_vec = c("E","F","G"),
keep_vars = c("SEQN","AGQ030")
)
## Load food allergy data within DBQ module for 2007–2010
dbq <- load_nhanes_module("DBQ", keep_vars = c("SEQN","DBQ920"))
## Load diabetes status within DIQ module
diq <- load_nhanes_module("DIQ", keep_vars = c("SEQN","DIQ010"))
## Load hypertension within BPQ module
bpq <- load_nhanes_module("BPQ", keep_vars = c("SEQN","BPQ020"))
## Load anthropometrics within BMX module
bmx <- load_nhanes_module(
"BMX",
keep_vars = c("SEQN","BMXBMI","BMXWAIST","BMXHT","BMXWT")
)
## Load systolic and diastolic blood pressure readings within BPX module
bp <- load_nhanes_module(
"BPX",
keep_vars = c(
"SEQN",
"BPXSY1","BPXSY2","BPXSY3","BPXSY4",
"BPXDI1","BPXDI2","BPXDI3","BPXDI4"
)
)
## Load serum vitamin D [25(OH)D] measurements within VID module
vid <- load_nhanes_module(
"VID",
keep_vars = c("SEQN", "LBXVIDMS", "LBXVIDLC", "LBDVIDMS")
)
## Load lifetime and current smoking status within SMQ module
smq <- load_nhanes_module("SMQ", keep_vars = c("SEQN","SMQ020","SMQ040"))
## Load health insurance coverage within HIQ module
hiq <- load_nhanes_module("HIQ", keep_vars = c("SEQN","HIQ011"))
## Load household food security within FSQ module
fsq <- load_nhanes_module("FSQ", keep_vars = c("SEQN","FSDHH"))
## Load alcohol consumption within ALQ module
alq <- load_nhanes_module("ALQ", keep_vars = c("SEQN","ALQ101","ALQ130"))
## Load HbA1c within GHB module
hba1c <- load_nhanes_module("GHB", keep_vars = c("SEQN","LBXGH"))
## Load total and HDL cholesterol
tchol <- load_nhanes_module("TCHOL", keep_vars = c("SEQN","LBXTC"))
hdl <- load_nhanes_module("HDL", keep_vars = c("SEQN","LBDHDD"))
## Load physical activity variables within PAQ module
paq_keep <- c("SEQN","PAQ650","PAQ665","PAD200","PAD320")
paq <- load_nhanes_module("PAQ", keep_vars = paq_keep)
# Dietary Inflammatory Index (DII)-------------------------------------------------------------
## Define NHANES cycles (2005–2012)
dii_cycles <- c("0506", "0708", "0910", "1112")
## Obtain 24-hour DII results for NHANES cycle
get_dii_cycle <- function(cyc) {
message("Obtain DII for NHANES cycle ", cyc, " (Day 1)...")
DII_NHANES_PLUS_RESULT(
NHANESCycle = cyc,
DAY = "first"
) |>
mutate(cycle_code_ndp = cyc)
}
## Obtain and pool DII data across cycles
dii_all <- dii_cycles |>
map_dfr(get_dii_cycle)
## Keep DII total scores (with and without alcohol) and nutrient components
dii_clean <- dii_all |>
transmute(
seqn = SEQN,
dii_etoh_raw = DII_ALL,
dii_raw = DII_NOETOH,
kcal = KCAL,
carb = CARB,
protein = PROTEIN,
totalfat = TOTALFAT,
satfat = SATFAT,
pufa = PUFA,
mufa = MUFA,
n3fat = N3FAT,
n6fat = N6FAT,
choles = CHOLES,
vitA = VITA,
bcarotene = BCAROTENE,
vitC = VITC,
vitE = VITE,
vitB6 = VITB6,
vitB12 = VITB12,
folicacid = FOLICACID,
thiamin = THIAMIN,
riboflavin = RIBOFLAVIN,
niacin = NIACIN,
iron = IRON,
mg = MG,
zn = ZN,
se = SE,
fiber = FIBER,
caffeine = CAFFEINE
) |>
mutate(
dii = dii_raw,
dii_etoh = dii_etoh_raw
)
## Define full set of DII component nutrient variables
dii_component_vars <- c(
"kcal", "carb", "protein", "totalfat", "satfat",
"pufa", "mufa", "n3fat", "n6fat", "choles",
"vitA", "bcarotene", "vitC", "vitE",
"vitB6", "vitB12", "folicacid",
"thiamin", "riboflavin", "niacin",
"iron", "mg", "zn", "se",
"fiber", "caffeine"
)
# Data cleaning and harmonization -------------------------------------------------------------
## Harmonize physical activity variables across NHANES cycles
paq_clean <- paq |>
transmute(
seqn = SEQN,
paq650 = PAQ650, # Vigorous recreational activity ≥10 minutes (2007+)
paq665 = PAQ665, # Moderate recreational activity ≥10 minutes (2007+)
pad200 = PAD200, # Vigorous activity ≥10 minutes over past 30 days (2005–06)
pad320 = PAD320 # Moderate activity ≥10 minutes over past 30 days (2005–06)
)
## Clean demographic and design variables and derive numeric cycle code
demo_clean <- demo |>
mutate(
sddsrvyr_label = as.character(SDDSRVYR),
sddsrvyr = case_when(
grepl("2005-2006", sddsrvyr_label) ~ 4,
grepl("2007-2008", sddsrvyr_label) ~ 5,
grepl("2009-2010", sddsrvyr_label) ~ 6,
grepl("2011-2012", sddsrvyr_label) ~ 7,
TRUE ~ NA_real_
)
) |>
transmute(
seqn = SEQN, # Participant ID
age = as.numeric(RIDAGEYR), # Age (years)
sex_raw = RIAGENDR, # Sex
race_raw = RIDRETH1, # Race/ethnicity (1–5)
educ_raw = DMDEDUC2, # Education level
pir = INDFMPIR, # Poverty-income ratio
sddsrvyr,
cycle_str = cycle_label(sddsrvyr), # Cycle label
sdmvpsu = SDMVPSU, # PSU for survey design
sdmvstra = SDMVSTRA, # Stratum for survey design
wtint2yr = WTINT2YR, # Interview weight (2-year)
wtmec2yr = WTMEC2YR, # MEC weight (2-year)
preg = RIDEXPRG # Pregnancy status
)
## Clean medical conditions and derive indicators for asthma, family history of asthma, and arthritis
mcq_clean <- mcq |>
transmute(
seqn = SEQN, # Participant ID
asthma_raw = MCQ010, # Asthma
fh_asthma_raw = MCQ300B, # Family history asthma
arthritis_raw = MCQ160A, # Ever arthritis
mcq190 = MCQ190, # Arthritis type (2005–08)
mcq191 = MCQ191, # Arthritis type (2009–10)
mcq195 = MCQ195 # Arthritis type (2011–12)
) |>
mutate(
asthma = to01(asthma_raw),
fh_asthma = to01(fh_asthma_raw),
arthritis_ever = to01(arthritis_raw)
)
## Derive eczema status in 2005–2006 cycle
eczema_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
agq_d |>
transmute(
seqn = SEQN,
agq180_ecz = to01(AGQ180) # Eczema code
),
by = "seqn"
) |>
mutate(
ad = if_else(sddsrvyr == 4, agq180_ecz, NA_real_)
) |>
select(seqn, ad)
## Derive food allergy status in 2007–2010 cycles
fa_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
dbq |>
transmute(
seqn = SEQN,
dbq920_fa = to01(DBQ920) # Food allergy code
),
by = "seqn"
) |>
mutate(
fa = if_else(sddsrvyr %in% c(5, 6), dbq920_fa, NA_real_)
) |>
select(seqn, fa)
## Harmonize allergic rhinitis across 2005–2006 and 2007–2012
ar_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
agq_d |>
transmute(seqn = SEQN, ar_agq010 = to01(AGQ010)), # AR code (2005–06)
by = "seqn"
) |>
left_join(
rdq |>
transmute(seqn = SEQN, ar_agq030_rdq = to01(AGQ030)), # AR code (2007–12)
by = "seqn"
) |>
mutate(
ar = case_when(
sddsrvyr == 4 ~ ar_agq010,
sddsrvyr %in% c(5, 6, 7) ~ ar_agq030_rdq,
TRUE ~ NA_real_
)
) |>
select(seqn, ar)
## Derive diabetes indicator from DIQ010
diq_clean <- diq |>
transmute(
seqn = SEQN,
dm2_diag_chr = as.character(DIQ010)
) |>
mutate(
dm2 = case_when(
dm2_diag_chr %in% c("1", "Yes") ~ 1,
dm2_diag_chr %in% c("2","No","3","Borderline") ~ 0,
TRUE ~ NA_real_
)
) |>
select(seqn, dm2)
## Derive hypertension indicator from BPQ020
bpq_clean <- bpq |>
transmute(
seqn = SEQN,
htn_chr = as.character(BPQ020)
) |>
mutate(
htn = case_when(
htn_chr %in% c("1","Yes") ~ 1,
htn_chr %in% c("2","No") ~ 0,
TRUE ~ NA_real_
)
) |>
select(seqn, htn)
## Clean anthropometrics and derive standard measures
bmx_clean <- bmx |>
transmute(
seqn = SEQN,
bmi = BMXBMI, # Body mass index (kg/m^2)
waist_cm = BMXWAIST, # Waist circumference (cm)
height_cm = BMXHT, # Standing height (cm)
weight_kg = BMXWT # Weight (kg)
)
## Compute mean systolic and diastolic blood pressures across readings
bp_clean <- bp |>
mutate(
sbp_mean = rowMeans(across(BPXSY1:BPXSY4), na.rm = TRUE),
dbp_mean = rowMeans(across(BPXDI1:BPXDI4), na.rm = TRUE),
dbp_mean = if_else(dbp_mean == 0, NA_real_, dbp_mean)
) |>
transmute(
seqn = SEQN,
sbp_mean = if_else(is.nan(sbp_mean), NA_real_, sbp_mean),
dbp_mean = if_else(is.nan(dbp_mean), NA_real_, dbp_mean)
)
## Harmonize vitamin D [25(OH)D] across laboratory methods (take first non-missing value)
vid_clean <- vid |>
mutate(
vitD25oh = coalesce(LBXVIDMS, LBXVIDLC, LBDVIDMS)
) |>
transmute(
seqn = SEQN,
vitD25oh = vitD25oh
)
## Derive smoking status (never, former, current)
smq_clean <- smq |>
transmute(
seqn = SEQN,
smoked_100 = as.numeric(SMQ020),
smoke_now = as.numeric(SMQ040)
) |>
mutate(
smoking = case_when(
smoked_100 == 2 ~ "Never",
smoked_100 == 1 & smoke_now == 3 ~ "Former",
smoked_100 == 1 & smoke_now %in% c(1,2) ~ "Current",
TRUE ~ NA_character_
)
)
## Derive insurance indicator using HIQ011
hiq_clean <- hiq |>
transmute(
seqn = SEQN,
insured = to01(HIQ011)
)
## Classify household food security into categories
fsq_clean <- fsq |>
transmute(
seqn = SEQN,
foodsec_raw = as.numeric(FSDHH),
food_security = case_when(
foodsec_raw == 1 ~ "Full",
foodsec_raw == 2 ~ "Marginal",
foodsec_raw == 3 ~ "Low",
foodsec_raw == 4 ~ "Very low",
TRUE ~ NA_character_
)
)
## Derive alcohol indicator from ALQ101
alq_clean <- alq |>
transmute(
seqn = SEQN,
any_alcohol = case_when(
ALQ101 %in% c(1, "1", "Yes") ~ 1,
ALQ101 %in% c(2, "2", "No") ~ 0,
TRUE ~ NA_real_
)
)
## Keep HbA1c
hba1c_clean <- hba1c |>
transmute(
seqn = SEQN,
hba1c = LBXGH
)
## Keep total and HDL cholesterol
tchol_clean <- tchol |> transmute(seqn = SEQN, tchol = LBXTC)
hdl_clean <- hdl |> transmute(seqn = SEQN, hdl = LBDHDD)
# Merge modules and define analytic cohort ----------------------------------------------------
## Merge all cleaned modules to construct single NHANES analytic dataset
nhanes_all <- demo_clean |>
left_join(mcq_clean, by = "seqn") |>
left_join(eczema_clean, by = "seqn") |>
left_join(fa_clean, by = "seqn") |>
left_join(ar_clean, by = "seqn") |>
left_join(bmx_clean, by = "seqn") |>
left_join(bp_clean, by = "seqn") |>
left_join(vid_clean, by = "seqn") |>
left_join(smq_clean, by = "seqn") |>
left_join(hiq_clean, by = "seqn") |>
left_join(fsq_clean, by = "seqn") |>
left_join(alq_clean, by = "seqn") |>
left_join(hba1c_clean, by = "seqn") |>
left_join(tchol_clean, by = "seqn") |>
left_join(hdl_clean, by = "seqn") |>
left_join(dii_clean, by = "seqn") |>
left_join(diq_clean, by = "seqn") |>
left_join(bpq_clean, by = "seqn") |>
left_join(paq_clean, by = "seqn")
## Restrict analytic cohort and derive all final variables used in regression and machine learning analyses
nhanes_analytic <- nhanes_all |>
### Restrict to adults 20–40 years old, 2005–2012
filter(
!is.na(sddsrvyr),
sddsrvyr %in% 4:7,
age >= 20, age <= 40
) |>
### Exclude pregnant subjects
filter(
is.na(preg) |
!preg %in% c("Yes, positive lab pregnancy test or self-reported pregnant at exam")
) |>
mutate(
### Multi-cycle weights (pooled 8-year weights over 4 cycles)
wtint8yr = wtint2yr / length(cycles),
wtmec8yr = wtmec2yr / length(cycles),
### Sex factor
sex = case_when(
sex_raw %in% c(1, "1") ~ "Male",
sex_raw %in% c(2, "2") ~ "Female",
as.character(sex_raw) == "Male" ~ "Male",
as.character(sex_raw) == "Female" ~ "Female",
TRUE ~ NA_character_
),
sex = factor(sex, levels = c("Male", "Female")),
### Race/ethnicity factor
race = case_when(
race_raw %in% c(3, "3", "Non-Hispanic White") ~ "Non-Hispanic White",
race_raw %in% c(4, "4", "Non-Hispanic Black") ~ "Non-Hispanic Black",
race_raw %in% c(1, "1", "Mexican American") ~ "Mexican American",
race_raw %in% c(2, "2", "Other Hispanic") ~ "Other Hispanic",
race_raw %in% c(5, "5") ~ "Other/multiracial",
TRUE ~ NA_character_
),
race = factor(
race,
levels = c(
"Non-Hispanic White",
"Non-Hispanic Black",
"Mexican American",
"Other Hispanic",
"Other/multiracial"
)
),
### Education factor
educ_factor_raw = factor(educ_raw),
educ = case_when(
educ_factor_raw %in% c(
"Less Than 9th Grade",
"Less than 9th grade"
) ~ "<9 years",
educ_factor_raw %in% c(
"9-11th Grade (Includes 12th grade with no diploma)",
"9-11th grade (Includes 12th grade with no diploma)",
"High School Grad/GED or Equivalent",
"High school graduate/GED or equivalent"
) ~ "9–12 years",
educ_factor_raw %in% c(
"Some College or AA degree",
"Some college or AA degree",
"College Graduate or above",
"College graduate or above"
) ~ ">12 years",
TRUE ~ NA_character_
),
educ = factor(
educ,
levels = c("<9 years","9–12 years",">12 years")
),
### Smoking factor
smoking = factor(
smoking,
levels = c("Never", "Former", "Current")
),
### Physical activity factor
phys_act = case_when(
# 2005–2006 (sddsrvyr == 4): PAD200 / PAD320
sddsrvyr == 4 & to01(pad200) == 1 ~ "Intense",
sddsrvyr == 4 & to01(pad200) != 1 & to01(pad320) == 1 ~ "Moderate",
sddsrvyr == 4 & to01(pad200) == 0 & to01(pad320) == 0 ~ "Sedentary",
# 2007–2012 (sddsrvyr %in% 5:7): PAQ650 / PAQ665
sddsrvyr %in% c(5, 6, 7) & as.character(paq650) == "Yes" ~ "Intense",
sddsrvyr %in% c(5, 6, 7) &
as.character(paq650) == "No" & as.character(paq665) == "Yes" ~ "Moderate",
sddsrvyr %in% c(5, 6, 7) &
as.character(paq650) == "No" & as.character(paq665) == "No" ~ "Sedentary",
TRUE ~ NA_character_
),
phys_act = factor(
phys_act,
levels = c("Sedentary","Moderate","Intense")
),
### Food security factor
food_sec = factor(
food_security,
levels = c("Full","Marginal","Low","Very low"),
ordered = TRUE
),
### Insurance factor
insur = factor(
case_when(
insured == 1 ~ "Yes",
insured == 0 ~ "No",
TRUE ~ NA_character_
),
levels = c("No", "Yes")
),
### Alcohol factor
alcohol = factor(
case_when(
any_alcohol == 0 ~ "No",
any_alcohol == 1 ~ "Yes",
TRUE ~ NA_character_
),
levels = c("No", "Yes")
),
### Validity flags for outcomes with restricted survey windows
ad_valid = sddsrvyr == 4, # Eczema (2005–06)
fa_valid = sddsrvyr %in% c(5, 6), # Food allergy (2007–10)
### Inflammatory arthritis based on multiple arthritis questions
inflam_arthritis = {
mcq190_chr <- as.character(mcq190)
mcq191_chr <- as.character(mcq191)
mcq195_chr <- as.character(mcq195)
case_when(
#### Denial of ever having arthritis
arthritis_ever == 0 ~ 0,
#### 2005–2006 and 2007–2008: MCQ190 arthritis type
arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
grepl("Rhe", mcq190_chr, ignore.case = TRUE) ~ 1,
arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
grepl("Ost|Osteo|Oth", mcq190_chr, ignore.case = TRUE) ~ 0,
#### 2009–2010: MCQ191 arthritis type
arthritis_ever == 1 & sddsrvyr == 6 &
(grepl("Rhe", mcq191_chr, ignore.case = TRUE) |
grepl("Pso", mcq191_chr, ignore.case = TRUE)) ~ 1,
arthritis_ever == 1 & sddsrvyr == 6 &
grepl("Ost|Osteo|Oth", mcq191_chr, ignore.case = TRUE) ~ 0,
#### 2011–2012: MCQ195 arthritis type
arthritis_ever == 1 & sddsrvyr == 7 &
(grepl("Rhe", mcq195_chr, ignore.case = TRUE) |
grepl("Pso", mcq195_chr, ignore.case = TRUE)) ~ 1,
arthritis_ever == 1 & sddsrvyr == 7 &
grepl("Ost|Osteo|Oth", mcq195_chr, ignore.case = TRUE) ~ 0,
TRUE ~ NA_real_
)
},
### Final inflammatory arthritis, diabetes, and hypertension indicators
arthritis = inflam_arthritis,
diabetes = dm2,
htn = htn,
### Family history of asthma
fh_asthma = fh_asthma,
### Anthropometric ratio and non-HDL cholesterol
whtr = waist_cm / height_cm,
non_hdl = if_else(!is.na(tchol) & !is.na(hdl), tchol - hdl, NA_real_),
### Vitamin D categories
vitD25oh_cat = case_when(
is.na(vitD25oh) ~ NA_character_,
vitD25oh < 20 ~ "<20 (deficient)",
vitD25oh >= 20 & vitD25oh < 30 ~ "20–29 (insufficient)",
vitD25oh >= 30 ~ "≥30 (sufficient)"
),
vitD25oh_cat = factor(
vitD25oh_cat,
levels = c("<20 (deficient)",
"20–29 (insufficient)",
"≥30 (sufficient)"),
ordered = TRUE
),
### DII quartiles
dii_quart = ntile(dii, 4),
dii_quart = factor(
dii_quart,
levels = c(1, 2, 3, 4),
labels = c(
"Q1 (most anti-inflammatory)",
"Q2",
"Q3",
"Q4 (most pro-inflammatory)"
),
ordered = TRUE
),
### Pregnancy status
preg_flag = case_when(
preg == "Yes, positive lab pregnancy test or self-reported pregnant at exam" ~ 1,
preg %in% c("SP not pregnant at exam",
"The participant was not pregnant at exam") ~ 0,
preg %in% c("Cannot ascertain if SP is pregnant at exam",
"Cannot ascertain if the participant is pregnant at exam") ~ NA_real_,
TRUE ~ NA_real_
),
preg_status = factor(
case_when(
preg_flag == 1 ~ "Pregnant",
preg_flag == 0 ~ "Not pregnant",
TRUE ~ NA_character_
),
levels = c("Not pregnant", "Pregnant")
),
### Quartiles for DII component nutrients
across(
all_of(dii_component_vars),
~ ntile(., 4),
.names = "{.col}_quart"
),
across(
all_of(paste0(dii_component_vars, "_quart")),
~ factor(
.,
levels = c(1, 2, 3, 4),
labels = c("Q1", "Q2", "Q3", "Q4"),
ordered = TRUE
)
),
### BMI categories
bmi_cat = case_when(
is.na(bmi) ~ NA_character_,
bmi < 18.5 ~ "Underweight (<18.5)",
bmi < 25 ~ "Normal (18.5–24.9)",
bmi < 30 ~ "Overweight (25–29.9)",
bmi < 35 ~ "Obesity I (30–34.9)",
bmi < 40 ~ "Obesity II (35–39.9)",
TRUE ~ "Obesity III (≥40)"
),
bmi_cat = factor(
bmi_cat,
levels = c(
"Underweight (<18.5)",
"Normal (18.5–24.9)",
"Overweight (25–29.9)",
"Obesity I (30–34.9)",
"Obesity II (35–39.9)",
"Obesity III (≥40)"
),
ordered = TRUE
),
### Waist-to-height ratio categories
whtr_cat = case_when(
is.na(whtr) ~ NA_character_,
whtr < 0.5 ~ "<0.5",
whtr < 0.6 ~ "0.5–0.59",
TRUE ~ "≥0.6"
),
whtr_cat = factor(
whtr_cat,
levels = c("<0.5", "0.5–0.59", "≥0.6"),
ordered = TRUE
),
### SBP categories
sbp_cat = case_when(
is.na(sbp_mean) ~ NA_character_,
sbp_mean < 120 ~ "<120",
sbp_mean < 130 ~ "120–129",
sbp_mean < 140 ~ "130–139",
sbp_mean < 160 ~ "140–159",
TRUE ~ "≥160"
),
sbp_cat = factor(
sbp_cat,
levels = c("<120", "120–129", "130–139", "140–159", "≥160"),
ordered = TRUE
),
### DBP categories
dbp_cat = case_when(
is.na(dbp_mean) ~ NA_character_,
dbp_mean < 80 ~ "<80",
dbp_mean < 90 ~ "80–89",
dbp_mean < 100 ~ "90–99",
TRUE ~ "≥100"
),
dbp_cat = factor(
dbp_cat,
levels = c("<80", "80–89", "90–99", "≥100"),
ordered = TRUE
),
### Numeric variables for ordered categories (for trend tests)
dii_quart_trend = if_else(!is.na(dii_quart), as.numeric(dii_quart) - 1, NA_real_),
food_sec_score = if_else(!is.na(food_sec), as.numeric(food_sec) - 1, NA_real_),
vitD25oh_cat_trend = if_else(!is.na(vitD25oh_cat), as.numeric(vitD25oh_cat) - 1, NA_real_),
bmi_cat_trend = if_else(!is.na(bmi_cat), as.numeric(bmi_cat) - 1, NA_real_),
whtr_cat_trend = if_else(!is.na(whtr_cat), as.numeric(whtr_cat) - 1, NA_real_),
sbp_cat_trend = if_else(!is.na(sbp_cat), as.numeric(sbp_cat) - 1, NA_real_),
dbp_cat_trend = if_else(!is.na(dbp_cat), as.numeric(dbp_cat) - 1, NA_real_),
phys_act_trend = if_else(!is.na(phys_act), as.numeric(phys_act) - 1, NA_real_)
)
# Finalize analytic dataset for regression and machine learning -------------------------------
## Construct final analytic dataset with all needed variables
nhanes_analytic <- nhanes_analytic |>
select(
### IDs and survey design
seqn,
sddsrvyr, cycle_str, sdmvpsu, sdmvstra,
wtint2yr, wtmec2yr, wtint8yr, wtmec8yr,
### Demographics
age, # Age
sex, # Sex (Male/Female)
race, # Race/ethnicity (5 categories)
educ, # Education (<9, 9–12, >12 years)
pir, # Poverty-income ratio
### Psychosocial and behavioral factors
smoking, # Smoking status (Never/Former/Current)
phys_act, # Physical activity (Sedentary/Moderate/Intense)
phys_act_trend, # Physical activity trend (0–2)
food_sec, # Food security (Full/Marginal/Low/Very low)
food_sec_score, # Food security score (0–3)
insur, # Health insurance (Yes/No)
any_alcohol, # Self-reported alcohol use (0/1; from ALQ101)
alcohol, # Self-reported alcohol use (No/Yes factor)
### Anthropometrics and cardiometabolic variables
bmi, # Body mass index (kg/m^2)
whtr, # Waist-to-height ratio
sbp_mean, # Mean systolic BP (mmHg)
dbp_mean, # Mean diastolic BP (mmHg)
tchol, # Total cholesterol (mg/dL)
hdl, # HDL cholesterol (mg/dL)
non_hdl, # Non-HDL cholesterol (mg/dL)
hba1c, # HbA1c (%)
### Anthropometric and BP categories
bmi_cat,
bmi_cat_trend, # BMI category trend (0–5)
whtr_cat,
whtr_cat_trend, # WHtR category trend (0–2)
sbp_cat,
sbp_cat_trend, # SBP category trend (0–4)
dbp_cat,
dbp_cat_trend, # DBP category trend (0–3)
### Vitamin D and DII summary scores
vitD25oh, # Serum Vitamin D [25(OH)D] (nmol/L)
vitD25oh_cat, # Vitamin D categories
vitD25oh_cat_trend, # Vitamin D category trend (0–2)
dii, # DII (no alcohol)
dii_quart, # DII quartiles (ordered)
dii_quart_trend, # DII quartile trend (0–3)
dii_etoh, # DII including alcohol (secondary)
### DII component nutrient quartiles
kcal_quart, carb_quart, protein_quart, totalfat_quart, satfat_quart,
pufa_quart, mufa_quart, n3fat_quart, n6fat_quart, choles_quart,
vitA_quart, bcarotene_quart, vitC_quart, vitE_quart,
vitB6_quart, vitB12_quart, folicacid_quart, thiamin_quart,
riboflavin_quart, niacin_quart, iron_quart, mg_quart, zn_quart, se_quart,
fiber_quart, caffeine_quart,
### Disease outcomes and family history
asthma, # Asthma
ar, # Allergic rhinitis
arthritis, # Inflammatory arthritis
diabetes, # Diabetes
htn, # Hypertension
ad, # Eczema (2005–06)
fa, # Food allergy (2007–10)
fh_asthma # Family history of asthma
)
## Set reference levels for key categorical variables in regression models
nhanes_analytic <- nhanes_analytic |>
mutate(
race = fct_relevel(as.factor(race), "Non-Hispanic White"),
educ = fct_relevel(as.factor(educ), ">12 years"),
vitD25oh_cat = fct_relevel(as.factor(vitD25oh_cat), "≥30 (sufficient)"),
dii_quart = fct_relevel(as.factor(dii_quart), "Q1 (most anti-inflammatory)"),
bmi_cat = fct_relevel(as.factor(bmi_cat), "Normal (18.5–24.9)")
)
## Save analytic dataset for later use
saveRDS(nhanes_analytic, file = "nhanes_analytic.rds")
## Add a constant for total population estimates
nhanes_analytic <- nhanes_analytic |> mutate(one = 1)
## Configure survey options for lonely PSUs
options(survey.lonely.psu = "adjust")
## Define interview-weighted survey design (8-year pooled weights)
des_int_8yr <- svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtint8yr,
nest = TRUE,
data = nhanes_analytic
)
## Define MEC-weighted survey design (8-year pooled weights)
des_mec_8yr <- svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = nhanes_analytic
)
## Total weighted population using MEC weights
total_pop_mec_8yr <- svytotal(~one, design = des_mec_8yr)
# Regression dataset generation ---------------------------------------------------------------
## Set survey and contrast options for regression models
options(survey.lonely.psu = "adjust")
options(contrasts = c("contr.treatment", "contr.treatment"))
## Ensure presence of constant variable for total population estimates
if (!"one" %in% names(nhanes_analytic)) {
nhanes_analytic <- nhanes_analytic |>
mutate(one = 1)
}
## Construct outcome-specific analytic datasets restricted to non-missing outcomes
### Asthma non-missing
asthma_reg <- nhanes_analytic |>
filter(!is.na(asthma))
### Allergic rhinitis non-missing
ar_reg <- nhanes_analytic |>
filter(!is.na(ar))
### Inflammatory arthritis non-missing
inflam_reg <- nhanes_analytic |>
filter(!is.na(arthritis))
### Hypertension non-missing
htn_reg <- nhanes_analytic |>
filter(!is.na(htn))
### Diabetes non-missing
dm_reg <- nhanes_analytic |>
filter(!is.na(diabetes))
### Eczema non-missing
ad_reg <- nhanes_analytic |>
filter(sddsrvyr == 4, !is.na(ad))
### Food allergy non-missing
fa_reg <- nhanes_analytic |>
filter(sddsrvyr %in% c(5, 6), !is.na(fa))
## Define helper function to construct outcome-specific survey designs using 8-year MEC weights
make_design_wt8yr <- function(data) {
svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = data
) |>
subset(wtmec8yr > 0)
}
## Construct outcome-specific survey designs for regression analyses
des_asthma_reg <- make_design_wt8yr(asthma_reg)
des_ar_reg <- make_design_wt8yr(ar_reg)
des_inflam_reg <- make_design_wt8yr(inflam_reg)
des_htn_reg <- make_design_wt8yr(htn_reg)
des_dm_reg <- make_design_wt8yr(dm_reg)
des_ad_reg <- make_design_wt8yr(ad_reg)
des_fa_reg <- make_design_wt8yr(fa_reg)
## Define list structure to organize outcome-specific datasets and survey designs
outcome_designs <- list(
asthma = list(
var = "asthma",
label = "Asthma",
data = asthma_reg,
design = des_asthma_reg
),
ar = list(
var = "ar",
label = "Allergic rhinitis",
data = ar_reg,
design = des_ar_reg
),
inflam = list(
var = "arthritis",
label = "Inflammatory arthritis",
data = inflam_reg,
design = des_inflam_reg
),
htn = list(
var = "htn",
label = "Hypertension",
data = htn_reg,
design = des_htn_reg
),
dm = list(
var = "diabetes",
label = "Diabetes",
data = dm_reg,
design = des_dm_reg
),
ad = list(
var = "ad",
label = "Eczema",
data = ad_reg,
design = des_ad_reg
),
fa = list(
var = "fa",
label = "Food allergy",
data = fa_reg,
design = des_fa_reg
)
)
# Variable information summary-----------------------------------------------------------------
## Full variable labels
var_labels <- c(
### IDs / survey design
seqn = "Participant ID",
sddsrvyr = "Survey cycle",
cycle_str = "Survey cycle label",
sdmvpsu = "Masked variance pseudo-PSU",
sdmvstra = "Masked variance pseudo-stratum",
wtint2yr = "Interview weight (2-year)",
wtmec2yr = "MEC exam weight (2-year)",
wtint8yr = "Interview weight (8-year pooled)",
wtmec8yr = "MEC exam weight (8-year pooled)",
### Demographics
age = "Age in years",
sex = "Sex",
race = "Race/ethnicity",
educ = "Education level",
pir = "Poverty-income ratio",
### Behavioral
smoking = "Smoking status",
phys_act = "Recreational physical activity level",
phys_act_trend = "Physical activity trend score",
food_sec = "Household food security status",
food_sec_score = "Food security ordinal score",
insur = "Any health insurance coverage",
any_alcohol = "Ever drank ≥12 drinks in any 1 year (ALQ101)",
alcohol = "Alcohol use",
### Anthropometric and cardiometabolic
bmi = "Body mass index (kg/m^2)",
whtr = "Waist-to-height ratio",
sbp_mean = "Mean systolic blood pressure (mmHg)",
dbp_mean = "Mean diastolic blood pressure (mmHg)",
tchol = "Total cholesterol (mg/dL)",
hdl = "HDL cholesterol (mg/dL)",
non_hdl = "Non-HDL cholesterol (mg/dL)",
hba1c = "Hemoglobin A1c (%)",
bmi_cat = "BMI category",
bmi_cat_trend = "BMI category trend score",
whtr_cat = "Waist-to-height ratio category",
whtr_cat_trend = "WHtR category trend score",
sbp_cat = "Systolic blood pressure category",
sbp_cat_trend = "SBP category trend score",
dbp_cat = "Diastolic blood pressure category",
dbp_cat_trend = "DBP category trend score",
### Nutrient (vitamin D and DII)
vitD25oh = "Serum Vitamin D [25(OH)D] concentration (nmol/L)",
vitD25oh_cat = "Vitamin D status category",
vitD25oh_cat_trend = "Vitamin D status trend score",
dii = "Dietary Inflammatory Index (no alcohol)",
dii_quart = "DII quartile",
dii_quart_trend = "DII quartile trend score",
dii_etoh = "Dietary Inflammatory Index including alcohol",
### DII individual nutrient quartiles
kcal_quart = "Total energy intake (kcal) quartile",
carb_quart = "Carbohydrate intake quartile",
protein_quart = "Protein intake quartile",
totalfat_quart = "Total fat intake quartile",
satfat_quart = "Saturated fat intake quartile",
pufa_quart = "Polyunsaturated fat intake quartile",
mufa_quart = "Monounsaturated fat intake quartile",
n3fat_quart = "Omega-3 fat intake quartile",
n6fat_quart = "Omega-6 fat intake quartile",
choles_quart = "Cholesterol intake quartile",
vitA_quart = "Vitamin A intake quartile",
bcarotene_quart = "Beta-carotene intake quartile",
vitC_quart = "Vitamin C intake quartile",
vitE_quart = "Vitamin E intake quartile",
vitB6_quart = "Vitamin B6 intake quartile",
vitB12_quart = "Vitamin B12 intake quartile",
folicacid_quart = "Folic acid intake quartile",
thiamin_quart = "Thiamin intake quartile",
riboflavin_quart = "Riboflavin intake quartile",
niacin_quart = "Niacin intake quartile",
iron_quart = "Iron intake quartile",
mg_quart = "Magnesium intake quartile",
zn_quart = "Zinc intake quartile",
se_quart = "Selenium intake quartile",
fiber_quart = "Fiber intake quartile",
caffeine_quart = "Caffeine intake quartile",
### Outcomes and related variables
asthma = "Reported asthma",
ar = "Reported allergic rhinitis (hay fever)",
arthritis = "Reported rheumatoid or psoriatic arthritis",
diabetes = "Reported diabetes",
htn = "Reported hypertension",
ad = "Reported eczema (atopic dermatitis; 2005–2006 only)",
fa = "Reported food allergy",
fh_asthma = "Family history of asthma"
)
## Define variable categories
var_roles <- c(
### ID and design
seqn = "id",
sddsrvyr = "design",
cycle_str = "design",
sdmvpsu = "design",
sdmvstra = "design",
wtint2yr = "weight_2yr",
wtmec2yr = "weight_2yr",
wtint8yr = "weight_8yr",
wtmec8yr = "weight_8yr",
### Core predictors
age = "predictor_core",
sex = "predictor_core",
race = "predictor_core",
educ = "predictor_core",
pir = "predictor_core",
smoking = "predictor_core",
phys_act = "predictor_core",
phys_act_trend = "predictor_core",
food_sec = "predictor_core",
food_sec_score = "predictor_core",
insur = "predictor_core",
any_alcohol = "predictor_core",
alcohol = "predictor_core",
bmi = "predictor_core",
whtr = "predictor_core",
sbp_mean = "predictor_core",
dbp_mean = "predictor_core",
tchol = "predictor_core",
hdl = "predictor_core",
non_hdl = "predictor_core",
hba1c = "predictor_core",
bmi_cat = "predictor_core",
bmi_cat_trend = "predictor_core",
whtr_cat = "predictor_core",
whtr_cat_trend = "predictor_core",
sbp_cat = "predictor_core",
sbp_cat_trend = "predictor_core",
dbp_cat = "predictor_core",
dbp_cat_trend = "predictor_core",
### Vitamin D and DII
vitD25oh = "predictor_core",
vitD25oh_cat = "predictor_core",
vitD25oh_cat_trend = "predictor_core",
dii = "exposure_DII_main",
dii_quart = "exposure_DII_ordinal",
dii_quart_trend = "exposure_DII_trend",
dii_etoh = "exposure_DII_secondary",
### Nutrient quartiles
kcal_quart = "predictor_nutrient",
carb_quart = "predictor_nutrient",
protein_quart = "predictor_nutrient",
totalfat_quart = "predictor_nutrient",
satfat_quart = "predictor_nutrient",
pufa_quart = "predictor_nutrient",
mufa_quart = "predictor_nutrient",
n3fat_quart = "predictor_nutrient",
n6fat_quart = "predictor_nutrient",
choles_quart = "predictor_nutrient",
vitA_quart = "predictor_nutrient",
bcarotene_quart = "predictor_nutrient",
vitC_quart = "predictor_nutrient",
vitE_quart = "predictor_nutrient",
vitB6_quart = "predictor_nutrient",
vitB12_quart = "predictor_nutrient",
folicacid_quart = "predictor_nutrient",
thiamin_quart = "predictor_nutrient",
riboflavin_quart = "predictor_nutrient",
niacin_quart = "predictor_nutrient",
iron_quart = "predictor_nutrient",
mg_quart = "predictor_nutrient",
zn_quart = "predictor_nutrient",
se_quart = "predictor_nutrient",
fiber_quart = "predictor_nutrient",
caffeine_quart = "predictor_nutrient",
### Outcomes
asthma = "outcome_primary",
ar = "outcome_primary",
arthritis = "outcome_primary",
diabetes = "outcome_primary",
htn = "outcome_primary",
ad = "outcome_secondary",
fa = "outcome_secondary",
fh_asthma = "predictor_family_history"
)
## Helper function to generate variable summary for data dictionary table
make_var_summary <- function(data, vars = names(data),
var_labels = NULL) {
map_dfr(vars, function(v) {
x <- data[[v]]
n <- length(x)
n_missing <- sum(is.na(x))
pct_missing <- if (n > 0) n_missing / n else NA_real_
type <- if (is.factor(x)) {
"factor"
} else if (is.character(x)) {
"character"
} else if (is.numeric(x)) {
"numeric"
} else if (is.logical(x)) {
"logical"
} else {
paste(class(x), collapse = ",")
}
if (is.numeric(x)) {
rng <- suppressWarnings(range(x, na.rm = TRUE))
min_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[1] else NA_real_
max_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[2] else NA_real_
levels_info <- NA_character_
} else {
min_val <- NA_real_
max_val <- NA_real_
if (is.factor(x)) {
lv <- levels(x)
} else {
lv <- unique(x)
lv <- lv[!is.na(lv)]
lv <- as.character(lv)
}
lv <- head(lv, 20L)
levels_info <- if (length(lv) == 0) NA_character_ else paste(lv, collapse = "; ")
}
label <- if (!is.null(var_labels) && !is.null(var_labels[[v]])) {
var_labels[[v]]
} else {
v
}
tibble(
variable = v,
label = label,
type = type,
n = n,
n_missing = n_missing,
pct_missing = pct_missing,
min = min_val,
max = max_val,
levels = levels_info
)
})
}
## Variables to include in data dictionary table
vars_for_dict <- c(
# ID and design
"seqn","sddsrvyr","cycle_str","sdmvpsu","sdmvstra",
"wtint2yr","wtmec2yr","wtint8yr","wtmec8yr",
# Demographics
"age","sex","race","educ","pir",
# Psychosocial
"smoking","phys_act","phys_act_trend",
"food_sec",
"insur","alcohol",
# Anthropometric and cardiometabolic
"bmi","whtr","sbp_mean","dbp_mean",
"tchol","hdl","non_hdl","hba1c",
"bmi_cat","bmi_cat_trend",
"whtr_cat","whtr_cat_trend",
"sbp_cat","sbp_cat_trend",
"dbp_cat","dbp_cat_trend",
# Vitamin D and DII
"vitD25oh","vitD25oh_cat","vitD25oh_cat_trend",
"dii","dii_quart","dii_quart_trend",
"dii_etoh",
# Nutrient quartiles
"kcal_quart","carb_quart","protein_quart","totalfat_quart","satfat_quart",
"pufa_quart","mufa_quart","n3fat_quart","n6fat_quart","choles_quart",
"vitA_quart","bcarotene_quart","vitC_quart","vitE_quart",
"vitB6_quart","vitB12_quart","folicacid_quart","thiamin_quart",
"riboflavin_quart","niacin_quart","iron_quart","mg_quart","zn_quart","se_quart",
"fiber_quart","caffeine_quart",
# Outcomes
"asthma","ar","arthritis","diabetes","htn","ad","fa",
"fh_asthma"
)
## Create variable summary for table
var_summary <- make_var_summary(
data = nhanes_analytic,
vars = vars_for_dict,
var_labels = var_labels
)
## Grouping of variables for Table 1
table1_group_map <- c(
seqn = "Demographics",
sddsrvyr = "Demographics",
cycle_str = "Demographics",
sdmvpsu = "Demographics",
sdmvstra = "Demographics",
wtint2yr = "Demographics",
wtmec2yr = "Demographics",
wtint8yr = "Demographics",
wtmec8yr = "Demographics",
age = "Demographics",
sex = "Demographics",
race = "Demographics",
educ = "Demographics",
pir = "Demographics",
smoking = "Demographics",
phys_act = "Demographics",
phys_act_trend = "Demographics",
food_sec = "Demographics",
insur = "Demographics",
alcohol = "Demographics",
fh_asthma = "Demographics",
bmi = "Anthropometrics",
whtr = "Anthropometrics",
sbp_mean = "Anthropometrics",
dbp_mean = "Anthropometrics",
tchol = "Anthropometrics",
hdl = "Anthropometrics",
non_hdl = "Anthropometrics",
hba1c = "Anthropometrics",
bmi_cat = "Anthropometrics",
bmi_cat_trend = "Anthropometrics",
whtr_cat = "Anthropometrics",
whtr_cat_trend = "Anthropometrics",
sbp_cat = "Anthropometrics",
sbp_cat_trend = "Anthropometrics",
dbp_cat = "Anthropometrics",
dbp_cat_trend = "Anthropometrics",
vitD25oh = "Nutrients",
vitD25oh_cat = "Nutrients",
vitD25oh_cat_trend = "Nutrients",
dii = "Nutrients",
dii_quart = "Nutrients",
dii_quart_trend = "Nutrients",
dii_etoh = "Nutrients",
kcal_quart = "Nutrients",
carb_quart = "Nutrients",
protein_quart = "Nutrients",
totalfat_quart = "Nutrients",
satfat_quart = "Nutrients",
pufa_quart = "Nutrients",
mufa_quart = "Nutrients",
n3fat_quart = "Nutrients",
n6fat_quart = "Nutrients",
choles_quart = "Nutrients",
vitA_quart = "Nutrients",
bcarotene_quart = "Nutrients",
vitC_quart = "Nutrients",
vitE_quart = "Nutrients",
vitB6_quart = "Nutrients",
vitB12_quart = "Nutrients",
folicacid_quart = "Nutrients",
thiamin_quart = "Nutrients",
riboflavin_quart = "Nutrients",
niacin_quart = "Nutrients",
iron_quart = "Nutrients",
mg_quart = "Nutrients",
zn_quart = "Nutrients",
se_quart = "Nutrients",
fiber_quart = "Nutrients",
caffeine_quart = "Nutrients",
asthma = "Disease Outcomes",
ar = "Disease Outcomes",
arthritis = "Disease Outcomes",
diabetes = "Disease Outcomes",
htn = "Disease Outcomes",
ad = "Disease Outcomes",
fa = "Disease Outcomes"
)
## Define helper function to identify binary numeric variables
identify_binary <- function(x) {
vals <- unique(na.omit(x))
length(vals) <= 2 && all(vals %in% c(0, 1))
}
## Prepare Table 1 (data dictionary table)
table1_df <- var_summary |>
mutate(
### Binary 0/1 variable handling
binary_flag = map_lgl(variable, ~ identify_binary(nhanes_analytic[[.x]])),
Type = case_when(
binary_flag ~ "binary",
type == "numeric" ~ "numeric",
TRUE ~ type
),
N = as.integer(n),
N_missing = as.integer(n_missing),
group = table1_group_map[variable]
) |>
select(
group,
Variable = variable,
Content = label,
Type,
N,
N_missing,
pct_missing,
Min = min,
Max = max,
Levels = levels,
binary_flag
) |>
mutate(
Min = if_else(binary_flag, 0, Min),
Max = if_else(binary_flag, 1, Max),
Levels = if_else(binary_flag, "0=No; 1=Yes", Levels),
### Fine-tune levels of select ariables
Levels = case_when(
Variable == "sddsrvyr" ~ "4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12",
Variable == "bmi_cat" ~ "Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40)",
Variable == "vitD25oh_cat" ~ "<20 (deficient); 20–29 (insufficient); ≥30 (sufficient)",
Variable == "educ" ~ "<9 years; 9–12 years; >12 years",
TRUE ~ Levels
)
) |>
arrange(
factor(
group,
levels = c("Demographics", "Anthropometrics", "Nutrients", "Disease Outcomes")
),
Variable
) |>
select(
group,
Variable,
Content,
Type,
N,
`N missing` = N_missing,
`% missing` = pct_missing,
Min,
Max,
Levels
)
## Generate Table 1
labs_vars <- c("bmi","whtr","tchol","hdl","non_hdl","hba1c","vitD25oh","dii","dii_etoh")
nhanes_table1_gt <- table1_df |>
gt(groupname_col = "group") |>
tab_header(
title = md("**Table 1: Extracted Variables: Types, Distributions, and Missingness**")
) |>
cols_label(
N = md("N"),
`N missing` = md("N missing"),
`% missing` = md("% missing")
) |>
### % missing always with 3 decimals
fmt_number(
columns = `% missing`,
decimals = 3
) |>
### Min/Max for selected continuous variables: 1 decimal
fmt_number(
columns = c(Min, Max),
decimals = 1,
rows = Variable %in% labs_vars & Type != "binary"
) |>
### Min/Max for all other non-binary numeric variables: 0 decimals
fmt_number(
columns = c(Min, Max),
decimals = 0,
rows = !(Variable %in% labs_vars) & Type != "binary"
) |>
### Min/Max for binary 0/1 variables: 0 decimals (0 and 1)
fmt_number(
columns = c(Min, Max),
decimals = 0,
rows = Type == "binary"
) |>
### Supress seqn separators
fmt_number(
columns = c(Min, Max),
decimals = 0,
use_seps = FALSE,
rows = Variable == "seqn"
) |>
cols_align(align = "left") |>
tab_options(table.font.size = px(12)) |>
### Bold column headings and row group labels
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_column_labels(everything()),
cells_row_groups()
)
)
# Additional adjustments for regression analyses-----------------------------------------------
## Add trend variables to outcome-specific survey designs
add_trend_vars_to_design <- function(design) {
vars <- design$variables
if ("vitD25oh_cat" %in% names(vars) && !"vitD25oh_cat_trend" %in% names(vars)) {
vars <- vars |>
mutate(
vitD25oh_cat_trend = if_else(
!is.na(vitD25oh_cat),
as.numeric(vitD25oh_cat) - 1,
NA_real_
)
)
}
if ("dii_quart" %in% names(vars) && !"dii_quart_trend" %in% names(vars)) {
vars <- vars |>
mutate(
dii_quart_trend = if_else(
!is.na(dii_quart),
as.numeric(dii_quart) - 1,
NA_real_
)
)
}
design$variables <- vars
design
}
des_asthma_reg <- add_trend_vars_to_design(des_asthma_reg)
des_ar_reg <- add_trend_vars_to_design(des_ar_reg)
des_inflam_reg <- add_trend_vars_to_design(des_inflam_reg)
des_htn_reg <- add_trend_vars_to_design(des_htn_reg)
des_dm_reg <- add_trend_vars_to_design(des_dm_reg)
des_ad_reg <- add_trend_vars_to_design(des_ad_reg)
des_fa_reg <- add_trend_vars_to_design(des_fa_reg)
## Full cohort design for characteristics and prevalence
make_design_wt8yr <- function(data) {
svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = data
) |>
subset(wtmec8yr > 0)
}
des_full <- make_design_wt8yr(nhanes_analytic)
## Outcome lists
outcome_list <- list(
asthma = list(
var = "asthma",
design = des_asthma_reg,
label = "Asthma"
),
ar = list(
var = "ar",
design = des_ar_reg,
label = "Allergic rhinitis"
),
inflam = list(
var = "arthritis",
design = des_inflam_reg,
label = "Inflammatory arthritis"
),
htn = list(
var = "htn",
design = des_htn_reg,
label = "Hypertension"
),
dm = list(
var = "diabetes",
design = des_dm_reg,
label = "Diabetes"
)
)
outcome_list_secondary <- list(
ad = list(
var = "ad",
design = des_ad_reg,
label = "Atopic dermatitis"
),
fa = list(
var = "fa",
design = des_fa_reg,
label = "Food allergy"
)
)
## Labels for core covariates and DII-related predictors
uni1_labels <- c(
age = "Age (years)",
sex = "Sex",
race = "Race/ethnicity",
insur = "Insurance",
educ = "Education",
pir = "Poverty-income ratio",
food_sec = "Food security",
smoking = "Smoking status",
phys_act = "Physical activity",
alcohol = "Alcohol use",
sbp_cat = "Systolic blood pressure category",
dbp_cat = "Diastolic blood pressure category",
bmi_cat = "BMI category",
whtr_cat = "Waist-to-height ratio category",
hdl = "HDL cholesterol (mg/dL)",
non_hdl = "Non-HDL cholesterol (mg/dL)",
tchol = "Total cholesterol (mg/dL)",
hba1c = "HbA1c (%)",
vitD25oh_cat = "25(OH)D category",
vitD25oh_cat_trend = "25(OH)D category trend (0–2)",
dii = "Dietary Inflammatory Index (per unit)",
dii_quart_trend = "DII quartile trend (0–3)",
dii_etoh = "DII including alcohol (per unit)"
)
## Mapping of factor levels to labels for characteristics and univariable models
custom_order_uni1 <- tribble(
~predictor, ~level, ~variable_label, ~rank,
### Sex
"sex", "Male", "Sex: Male", 1,
"sex", "Female", "Sex: Female", 2,
### Race
"race", "Non-Hispanic White", "Race/ethnicity: Non-Hispanic White", 1,
"race", "Non-Hispanic Black", "Race/ethnicity: Non-Hispanic Black", 2,
"race", "Mexican American", "Race/ethnicity: Mexican American", 3,
"race", "Other Hispanic", "Race/ethnicity: Other Hispanic", 4,
"race", "Other/multiracial", "Race/ethnicity: Other race", 5,
### Insurance
"insur", "No", "Insurance: No", 1,
"insur", "Yes", "Insurance: Yes", 2,
### Education
"educ", "<9 years", "Education: <9", 1,
"educ", "9–12 years", "Education: 9-12", 2,
"educ", ">12 years", "Education: >12", 3,
### Food security
"food_sec", "Full", "Food security: Full", 1,
"food_sec", "Marginal", "Food security: Marginal", 2,
"food_sec", "Low", "Food security: Low", 3,
"food_sec", "Very low", "Food security: Very low", 4,
### Smoking
"smoking", "Never", "Smoking status: Never", 1,
"smoking", "Current", "Smoking status: Current", 2,
"smoking", "Former", "Smoking status: Former", 3,
### Alcohol
"alcohol", "No", "Alcohol use: No", 1,
"alcohol", "Yes", "Alcohol use: Yes", 2,
### Physical activity
"phys_act", "Sedentary", "Physical activity: Sedentary", 1,
"phys_act", "Moderate", "Physical activity: Moderate", 2,
"phys_act", "Intense", "Physical activity: Intense", 3,
### SBP categories
"sbp_cat", "<120", "Systolic blood pressure category: <120", 1,
"sbp_cat", "120–129", "Systolic blood pressure category: 120-129", 2,
"sbp_cat", "130–139", "Systolic blood pressure category: 130-139", 3,
"sbp_cat", "140–159", "Systolic blood pressure category: 140-159", 4,
"sbp_cat", "≥160", "Systolic blood pressure category: >=160", 5,
### DBP categories
"dbp_cat", "<80", "Diastolic blood pressure category: <80", 1,
"dbp_cat", "80–89", "Diastolic blood pressure category: 80-89", 2,
"dbp_cat", "90–99", "Diastolic blood pressure category: 90-99", 3,
"dbp_cat", "≥100", "Diastolic blood pressure category: >=100", 4,
### BMI categories
"bmi_cat", "Underweight (<18.5)", "BMI category: Underweight", 1,
"bmi_cat", "Normal (18.5–24.9)", "BMI category: Normal weight", 2,
"bmi_cat", "Overweight (25–29.9)", "BMI category: Overweight", 3,
"bmi_cat", "Obesity I (30–34.9)", "BMI category: Obesity I", 4,
"bmi_cat", "Obesity II (35–39.9)", "BMI category: Obesity II", 5,
"bmi_cat", "Obesity III (≥40)", "BMI category: Obesity III", 6,
### WHtR categories
"whtr_cat", "<0.5", "Waist-to-height ratio category: <0.5", 1,
"whtr_cat", "0.5–0.59", "Waist-to-height ratio category: 0.5-0.59", 2,
"whtr_cat", "≥0.6", "Waist-to-height ratio category: >=0.6", 3,
### Vitamin D categories
"vitD25oh_cat", "<20 (deficient)", "25(OH)D category: <20", 1,
"vitD25oh_cat", "20–29 (insufficient)", "25(OH)D category: 20-29", 2,
"vitD25oh_cat", "≥30 (sufficient)", "25(OH)D category: >=30", 3
)
is_poly_term <- function(terms, predictor) {
any(grepl(paste0("^", predictor, "\\."), terms))
}
add_ref_and_label <- function(td, des, predictor, outcome_name, outcome_info, base_label = NULL) {
if (is.null(base_label)) {
base_label <- predictor
}
var_data <- des$variables[[predictor]]
### Numeric predictors have no reference row, only single row with label
if (is.numeric(var_data) || is.integer(var_data)) {
td <- td |>
mutate(variable_label = base_label)
return(td)
}
### Character predictors converted to factor
if (is.character(var_data)) {
var_data <- factor(var_data)
}
### Detect polynomial terms and assign linear or quadratic trend
if (is_poly_term(td$term, predictor)) {
td <- td |>
mutate(
variable_label = case_when(
grepl("\\.L$", term) ~ paste0(base_label, ": linear trend"),
grepl("\\.Q$", term) ~ paste0(base_label, ": quadratic trend"),
TRUE ~ paste0(base_label, ": ", term)
)
)
return(td)
}
### Add reference row and level labels for factor predictors
if (is.factor(var_data)) {
levels_vec <- levels(var_data)
pattern <- paste0("^", predictor)
td_nonref <- td |>
mutate(
level = sub(pattern, "", term),
variable_label = paste0(base_label, ": ", level)
)
ref_level <- levels_vec[1]
ref_label <- paste0(base_label, ": ", ref_level)
ref_row <- tibble(
term = paste0(predictor, ref_level),
estimate = 1,
conf.low = 1,
conf.high = 1,
p.value = NA_real_,
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor,
level = ref_level,
variable_label = ref_label
)
td_full <- bind_rows(ref_row, td_nonref)
return(td_full)
}
td |>
mutate(variable_label = base_label)
}
# Print Table 1 variable summary---------------------------------------------------------------
nhanes_table1_gt| Table 1: Extracted Variables: Types, Distributions, and Missingness | ||||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Content | Type | N | N missing | Min | Max | Levels | |
| Demographics | ||||||||
| age | Age in years | numeric | 7803 | 0 | 0.000 | 20 | 40 | NA |
| alcohol | Alcohol use | factor | 7803 | 1192 | 0.153 | NA | NA | No; Yes |
| cycle_str | Survey cycle label | character | 7803 | 0 | 0.000 | NA | NA | 2005-2006; 2007-2008; 2009-2010; 2011-2012 |
| educ | Education level | factor | 7803 | 7 | 0.001 | NA | NA | <9 years; 9–12 years; >12 years |
| fh_asthma | Family history of asthma | binary | 7803 | 131 | 0.017 | 0 | 1 | 0=No; 1=Yes |
| food_sec | Household food security status | factor | 7803 | 3052 | 0.391 | NA | NA | Full; Marginal; Low; Very low |
| insur | Any health insurance coverage | factor | 7803 | 7 | 0.001 | NA | NA | No; Yes |
| phys_act | Recreational physical activity level | factor | 7803 | 21 | 0.003 | NA | NA | Sedentary; Moderate; Intense |
| phys_act_trend | Physical activity trend score | numeric | 7803 | 21 | 0.003 | 0 | 2 | NA |
| pir | Poverty-income ratio | numeric | 7803 | 565 | 0.072 | 0 | 5 | NA |
| race | Race/ethnicity | factor | 7803 | 742 | 0.095 | NA | NA | Non-Hispanic White; Non-Hispanic Black; Mexican American; Other Hispanic; Other/multiracial |
| sddsrvyr | Survey cycle | numeric | 7803 | 0 | 0.000 | 4 | 7 | 4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12 |
| sdmvpsu | Masked variance pseudo-PSU | numeric | 7803 | 0 | 0.000 | 1 | 3 | NA |
| sdmvstra | Masked variance pseudo-stratum | numeric | 7803 | 0 | 0.000 | 44 | 103 | NA |
| seqn | Participant ID | numeric | 7803 | 0 | 0.000 | 31144 | 71912 | NA |
| sex | Sex | factor | 7803 | 0 | 0.000 | NA | NA | Male; Female |
| smoking | Smoking status | factor | 7803 | 7 | 0.001 | NA | NA | Never; Former; Current |
| wtint2yr | Interview weight (2-year) | numeric | 7803 | 0 | 0.000 | 1,339 | 152,859 | NA |
| wtint8yr | Interview weight (8-year pooled) | numeric | 7803 | 0 | 0.000 | 335 | 38,215 | NA |
| wtmec2yr | MEC exam weight (2-year) | numeric | 7803 | 0 | 0.000 | 0 | 163,393 | NA |
| wtmec8yr | MEC exam weight (8-year pooled) | numeric | 7803 | 0 | 0.000 | 0 | 40,848 | NA |
| Anthropometrics | ||||||||
| bmi | Body mass index (kg/m^2) | numeric | 7803 | 347 | 0.044 | 14.7 | 76.1 | NA |
| bmi_cat | BMI category | factor | 7803 | 347 | 0.044 | NA | NA | Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40) |
| bmi_cat_trend | BMI category trend score | numeric | 7803 | 347 | 0.044 | 0 | 5 | NA |
| dbp_cat | Diastolic blood pressure category | factor | 7803 | 715 | 0.092 | NA | NA | <80; 80–89; 90–99; ≥100 |
| dbp_cat_trend | DBP category trend score | numeric | 7803 | 715 | 0.092 | 0 | 3 | NA |
| dbp_mean | Mean diastolic blood pressure (mmHg) | numeric | 7803 | 715 | 0.092 | 3 | 125 | NA |
| hba1c | Hemoglobin A1c (%) | numeric | 7803 | 770 | 0.099 | 3.6 | 15.9 | NA |
| hdl | HDL cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 7.0 | 149.0 | NA |
| non_hdl | Non-HDL cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 27.0 | 494.0 | NA |
| sbp_cat | Systolic blood pressure category | factor | 7803 | 708 | 0.091 | NA | NA | <120; 120–129; 130–139; 140–159; ≥160 |
| sbp_cat_trend | SBP category trend score | numeric | 7803 | 708 | 0.091 | 0 | 4 | NA |
| sbp_mean | Mean systolic blood pressure (mmHg) | numeric | 7803 | 708 | 0.091 | 80 | 196 | NA |
| tchol | Total cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 59.0 | 523.0 | NA |
| whtr | Waist-to-height ratio | numeric | 7803 | 633 | 0.081 | 0.4 | 1.1 | NA |
| whtr_cat | Waist-to-height ratio category | factor | 7803 | 633 | 0.081 | NA | NA | <0.5; 0.5–0.59; ≥0.6 |
| whtr_cat_trend | WHtR category trend score | numeric | 7803 | 633 | 0.081 | 0 | 2 | NA |
| Nutrients | ||||||||
| bcarotene_quart | Beta-carotene intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| caffeine_quart | Caffeine intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| carb_quart | Carbohydrate intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| choles_quart | Cholesterol intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| dii | Dietary Inflammatory Index (no alcohol) | numeric | 7803 | 745 | 0.095 | −5.7 | 8.1 | NA |
| dii_etoh | Dietary Inflammatory Index including alcohol | numeric | 7803 | 745 | 0.095 | −5.7 | 8.3 | NA |
| dii_quart | DII quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1 (most anti-inflammatory); Q2; Q3; Q4 (most pro-inflammatory) |
| dii_quart_trend | DII quartile trend score | numeric | 7803 | 745 | 0.095 | 0 | 3 | NA |
| fiber_quart | Fiber intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| folicacid_quart | Folic acid intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| iron_quart | Iron intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| kcal_quart | Total energy intake (kcal) quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| mg_quart | Magnesium intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| mufa_quart | Monounsaturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| n3fat_quart | Omega-3 fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| n6fat_quart | Omega-6 fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| niacin_quart | Niacin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| protein_quart | Protein intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| pufa_quart | Polyunsaturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| riboflavin_quart | Riboflavin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| satfat_quart | Saturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| se_quart | Selenium intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| thiamin_quart | Thiamin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| totalfat_quart | Total fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitA_quart | Vitamin A intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitB12_quart | Vitamin B12 intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitB6_quart | Vitamin B6 intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitC_quart | Vitamin C intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitD25oh | Serum Vitamin D [25(OH)D] concentration (nmol/L) | numeric | 7803 | 1054 | 0.135 | 6.2 | 213.0 | NA |
| vitD25oh_cat | Vitamin D status category | factor | 7803 | 1054 | 0.135 | NA | NA | <20 (deficient); 20–29 (insufficient); ≥30 (sufficient) |
| vitD25oh_cat_trend | Vitamin D status trend score | numeric | 7803 | 1054 | 0.135 | 0 | 2 | NA |
| vitE_quart | Vitamin E intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| zn_quart | Zinc intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| Disease Outcomes | ||||||||
| ad | Reported eczema (atopic dermatitis; 2005–2006 only) | binary | 7803 | 6108 | 0.783 | 0 | 1 | 0=No; 1=Yes |
| ar | Reported allergic rhinitis (hay fever) | binary | 7803 | 20 | 0.003 | 0 | 1 | 0=No; 1=Yes |
| arthritis | Reported rheumatoid or psoriatic arthritis | binary | 7803 | 188 | 0.024 | 0 | 1 | 0=No; 1=Yes |
| asthma | Reported asthma | binary | 7803 | 6 | 0.001 | 0 | 1 | 0=No; 1=Yes |
| diabetes | Reported diabetes | binary | 7803 | 7 | 0.001 | 0 | 1 | 0=No; 1=Yes |
| fa | Reported food allergy | binary | 7803 | 3713 | 0.476 | 0 | 1 | 0=No; 1=Yes |
| htn | Reported hypertension | binary | 7803 | 14 | 0.002 | 0 | 1 | 0=No; 1=Yes |
4 Results
Cohort characteristics
Cohort characteristics are shown in Table 2 for the full cohort (2005–2012) and in Table 3 for the time-restricted cohorts for eczema (2005–2006) and food allergy (2007–2010). In general, participants with asthma, allergic rhinitis, eczema, and food allergy were similar to the respective full cohort populations, with modest differences (e.g., higher proportions of males and small variations in physical activity and smoking patterns). Patients with allergic rhinitis, inflammatory arthritis, diabetes, and hypertension were slightly older than the full cohort. Those with inflammatory arthritis, diabetes, and hypertension also exhibited higher DII scores, while participants with allergic rhinitis had slightly lower DII scores relative to the full cohort. There were smaller differences in DII scores in the eczema and food allergy cohorts relative to their respective time-restricted full cohorts. These differences were formally assessed in subsequent regression analyses.
# Continuous covariates and categorical levels ------------------------------------------------
## Continuous characteristics
char_continuous <- c("age", "pir", "hdl", "non_hdl", "tchol", "hba1c", "dii", "dii_etoh")
## Predictor order
char_order <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"dii",
"dii_etoh"
)
## Characteristics to include
char_skeleton_raw <- bind_rows(
tibble(
predictor = char_continuous,
level = NA_character_,
variable_label = uni1_labels[char_continuous],
type = "continuous",
rank = NA_integer_
),
custom_order_uni1 |>
mutate(type = "categorical")
)
## Characteristic ordering
char_skeleton <- char_skeleton_raw |>
mutate(
predictor = factor(
predictor,
levels = c(char_order, setdiff(unique(predictor), char_order))
)
) |>
arrange(predictor, rank) |>
mutate(
predictor = as.character(predictor)
)
## Helper function to compute characteristics for a single survey design
compute_char_for_group <- function(design, group_name) {
vars <- design$variables
map_dfr(
seq_len(nrow(char_skeleton)),
function(i) {
row <- char_skeleton[i, ]
predictor <- row$predictor
label <- row$variable_label
type <- row$type
lvl <- row$level
# If predictor not present, return NA
if (!predictor %in% names(vars)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
if (type == "continuous") {
# Continuous: survey-weighted mean (SE)
f <- as.formula(paste0("~", predictor))
est <- svymean(f, design, na.rm = TRUE)
m <- as.numeric(est)[1]
se <- as.numeric(SE(est))[1]
value <- sprintf("%.2f (%.2f)", m, se)
} else {
### Categorical: survey-weighted prevalence (SE) for each level
var <- vars[[predictor]]
### Coerce to factor and drop unused levels
if (!is.factor(var)) {
var <- factor(var)
} else {
var <- droplevels(var)
}
### If target level is missing or not in factor levels, return NA
if (is.na(lvl) || !lvl %in% levels(var)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
### Restrict to non-missing values for predictor
nonmiss <- !is.na(var)
if (!any(nonmiss)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
### Subset design to non-missing rows for predictor
design_nm <- subset(design, nonmiss)
### Compute vars for subsetted design and attach indicator
vars_nm <- design_nm$variables
vars_nm$indicator <- as.numeric(var[nonmiss] == lvl)
design_nm$variables <- vars_nm
est <- tryCatch(
svymean(~indicator, design_nm, na.rm = TRUE),
error = function(e) NA
)
if (all(is.na(est))) {
value <- NA_character_
} else {
p <- as.numeric(est)[1] * 100
se <- as.numeric(SE(est))[1] * 100
value <- sprintf("%.1f%% (%.1f)", p, se)
}
}
tibble(
Group = group_name,
Characteristic = label,
Value = value
)
}
)
}
# Full cohort and primary outcomes-------------------------------------------------------------
des_full_primary <- des_full
des_asthma_cases <- subset(des_asthma_reg, asthma == 1)
des_ar_cases <- subset(des_ar_reg, ar == 1)
des_inflam_cases <- subset(des_inflam_reg, arthritis == 1)
des_dm_cases <- subset(des_dm_reg, diabetes == 1)
des_htn_cases <- subset(des_htn_reg, htn == 1)
group_designs_primary <- list(
"Full cohort" = des_full_primary,
"Asthma" = des_asthma_cases,
"Allergic rhinitis" = des_ar_cases,
"Inflammatory arthritis" = des_inflam_cases,
"Diabetes" = des_dm_cases,
"Hypertension" = des_htn_cases
)
## Unweighted Ns for primary groups
group_ns_primary <- map_int(group_designs_primary, ~ nrow(.x$variables))
group_labels_primary <- setNames(
paste0(names(group_designs_primary), " (N=", group_ns_primary, ")"),
names(group_designs_primary)
)
char_primary_long <- imap_dfr(
group_designs_primary,
~ compute_char_for_group(design = .x, group_name = .y)
)
## Rename characteristics for DII and Vitamin D
char_primary_wide <- char_primary_long |>
mutate(
Characteristic = dplyr::recode(
Characteristic,
"Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
"DII including alcohol (per unit)" = "DII including alcohol"
),
Characteristic = gsub(
"25\\(OH\\)D category",
"Vitamin D category",
Characteristic
)
) |>
pivot_wider(
id_cols = Characteristic,
names_from = Group,
values_from = Value
) |>
### Add unweighted N to column labels
rename_with(
~ group_labels_primary[.x],
.cols = any_of(names(group_labels_primary))
)
tbl_char_primary <- char_primary_wide |>
gt() |>
tab_header(
title = md(
"**Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
) |>
fmt_markdown(columns = everything()) |>
tab_options(
table.font.size = px(10),
column_labels.font.weight = "bold"
) |>
tab_footnote(
footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
locations = cells_title(groups = "title")
)
# Time-restricted eczema and food allergy cohorts----------------------------------------------
ad_cohort <- nhanes_analytic |>
filter(sddsrvyr == 4, !is.na(ad))
fa_cohort <- nhanes_analytic |>
filter(sddsrvyr %in% c(5, 6), !is.na(fa))
des_ad_cohort <- make_design_wt8yr(ad_cohort)
des_fa_cohort <- make_design_wt8yr(fa_cohort)
des_ad_cases <- subset(des_ad_reg, ad == 1)
des_fa_cases <- subset(des_fa_reg, fa == 1)
group_designs_secondary <- list(
"Cohort (2005–2006)" = des_ad_cohort,
"Eczema" = des_ad_cases,
"Cohort (2007–2010)" = des_fa_cohort,
"Food allergy" = des_fa_cases
)
## Unweighted Ns for secondary groups
group_ns_secondary <- map_int(group_designs_secondary, ~ nrow(.x$variables))
group_labels_secondary <- setNames(
paste0(names(group_designs_secondary), " (N=", group_ns_secondary, ")"),
names(group_designs_secondary)
)
char_secondary_long <- imap_dfr(
group_designs_secondary,
~ compute_char_for_group(design = .x, group_name = .y)
)
## Rename characteristics for DII and Vitamin D
char_secondary_wide <- char_secondary_long |>
mutate(
Characteristic = dplyr::recode(
Characteristic,
"Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
"DII including alcohol (per unit)" = "DII including alcohol"
),
Characteristic = gsub(
"25\\(OH\\)D category",
"Vitamin D category",
Characteristic
)
) |>
pivot_wider(
id_cols = Characteristic,
names_from = Group,
values_from = Value
) |>
### Add unweighted N to column labels
rename_with(
~ group_labels_secondary[.x],
.cols = any_of(names(group_labels_secondary))
)
tbl_char_secondary <- char_secondary_wide |>
gt() |>
tab_header(
title = md(
"**Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)**"
)
) |>
fmt_markdown(columns = everything()) |>
tab_options(
table.font.size = px(10),
column_labels.font.weight = "bold"
) |>
tab_footnote(
footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
locations = cells_title(groups = "title")
)
# Print cohort characteristic tables-----------------------------------------------------------
tbl_char_primary| Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension1 | ||||||
|---|---|---|---|---|---|---|
| Characteristic | Full cohort (N=7520) | Asthma (N=1166) | Allergic rhinitis (N=824) | Inflammatory arthritis (N=100) | Diabetes (N=181) | Hypertension (N=878) |
| Age (years) | 30.03 (0.15) | 29.18 (0.26) | 32.09 (0.22) | 33.26 (0.71) | 33.55 (0.44) | 32.77 (0.24) |
| Sex: Male | 51.3% (0.6) | 45.1% (1.7) | 44.2% (1.6) | 31.7% (4.3) | 45.9% (4.8) | 55.4% (1.8) |
| Sex: Female | 48.7% (0.6) | 54.9% (1.7) | 55.8% (1.6) | 68.3% (4.3) | 54.1% (4.8) | 44.6% (1.8) |
| Race/ethnicity: Non-Hispanic White | 65.9% (2.0) | 71.6% (2.3) | 77.7% (2.0) | 74.2% (4.6) | 55.9% (4.7) | 63.2% (2.9) |
| Race/ethnicity: Non-Hispanic Black | 13.8% (1.0) | 15.4% (1.5) | 10.7% (1.1) | 14.2% (3.6) | 22.5% (3.3) | 20.7% (2.0) |
| Race/ethnicity: Mexican American | 13.1% (1.2) | 6.6% (1.0) | 5.9% (0.9) | 7.6% (2.2) | 11.6% (2.2) | 10.3% (1.3) |
| Race/ethnicity: Other Hispanic | 7.3% (0.9) | 6.3% (1.1) | 5.7% (1.1) | 4.0% (1.7) | 10.1% (2.5) | 5.8% (1.1) |
| Race/ethnicity: Other race | NA | NA | NA | NA | NA | NA |
| Insurance: No | 30.5% (1.0) | 26.7% (1.6) | 23.9% (1.9) | 16.7% (3.8) | 24.0% (3.7) | 25.6% (1.6) |
| Insurance: Yes | 69.5% (1.0) | 73.3% (1.6) | 76.1% (1.9) | 83.3% (3.8) | 76.0% (3.7) | 74.4% (1.6) |
| Education: <9 | 4.6% (0.4) | 1.9% (0.5) | 2.5% (0.6) | 7.2% (3.2) | 6.0% (2.1) | 3.4% (0.8) |
| Education: 9-12 | 34.4% (1.3) | 34.5% (2.0) | 26.4% (2.0) | 38.6% (5.5) | 39.1% (4.4) | 37.1% (1.9) |
| Education: >12 | 61.0% (1.4) | 63.6% (2.2) | 71.1% (2.2) | 54.3% (7.1) | 54.9% (4.8) | 59.5% (2.2) |
| Poverty-income ratio | 2.70 (0.05) | 2.59 (0.09) | 2.96 (0.09) | 2.24 (0.17) | 2.34 (0.13) | 2.58 (0.07) |
| Food security: Full | 81.8% (0.9) | 79.4% (1.9) | 85.3% (1.6) | 70.5% (4.6) | 72.9% (5.4) | 77.7% (2.3) |
| Food security: Marginal | 13.0% (0.8) | 14.1% (1.8) | 11.0% (1.2) | 21.9% (5.0) | 19.0% (5.1) | 17.2% (2.0) |
| Food security: Low | 3.5% (0.3) | 3.1% (0.5) | 2.5% (0.8) | 3.1% (2.0) | 3.4% (1.8) | 3.3% (0.6) |
| Food security: Very low | 1.7% (0.3) | 3.5% (0.5) | 1.2% (0.3) | 4.5% (2.5) | 4.7% (2.7) | 1.8% (0.7) |
| Smoking status: Never | 58.8% (1.1) | 53.1% (2.2) | 57.5% (2.4) | 39.8% (5.3) | 59.1% (4.2) | 49.1% (2.0) |
| Smoking status: Current | 26.9% (0.9) | 33.6% (2.0) | 25.4% (1.9) | 46.6% (6.0) | 28.1% (4.0) | 34.0% (2.1) |
| Smoking status: Former | 14.3% (0.6) | 13.4% (1.3) | 17.1% (1.8) | 13.6% (4.1) | 12.8% (3.6) | 16.9% (1.5) |
| Physical activity: Sedentary | 34.5% (1.0) | 32.2% (1.5) | 29.3% (1.7) | 37.4% (5.1) | 44.9% (3.8) | 38.0% (2.1) |
| Physical activity: Moderate | 24.4% (0.7) | 25.3% (1.6) | 28.8% (1.8) | 38.8% (5.3) | 32.9% (4.7) | 27.5% (1.6) |
| Physical activity: Intense | 41.1% (1.2) | 42.4% (2.1) | 41.9% (2.2) | 23.8% (4.9) | 22.2% (3.8) | 34.5% (2.0) |
| Alcohol use: No | 19.0% (0.7) | 16.2% (1.4) | 16.5% (1.5) | 25.1% (4.3) | 26.9% (3.7) | 17.4% (1.3) |
| Alcohol use: Yes | 81.0% (0.7) | 83.8% (1.4) | 83.5% (1.5) | 74.9% (4.3) | 73.1% (3.7) | 82.6% (1.3) |
| Systolic blood pressure category: <120 | 68.5% (0.9) | 67.4% (1.5) | 70.6% (1.8) | 74.2% (5.9) | 42.6% (5.3) | 38.5% (2.6) |
| Systolic blood pressure category: 120-129 | 21.3% (0.7) | 23.1% (1.4) | 20.2% (1.5) | 17.4% (5.6) | 37.3% (4.7) | 31.7% (2.3) |
| Systolic blood pressure category: 130-139 | 6.7% (0.4) | 6.1% (0.7) | 6.1% (1.0) | 8.4% (2.6) | 13.3% (3.4) | 16.8% (1.8) |
| Systolic blood pressure category: 140-159 | 3.3% (0.3) | 3.0% (0.6) | 2.9% (0.5) | NA | 5.1% (1.3) | 10.9% (1.2) |
| Systolic blood pressure category: >=160 | 0.3% (0.1) | 0.4% (0.2) | 0.2% (0.1) | NA | 1.6% (1.2) | 2.1% (0.4) |
| Diastolic blood pressure category: <80 | 84.9% (0.8) | 84.1% (1.3) | 82.8% (1.6) | 83.5% (4.3) | 66.3% (4.1) | 64.2% (2.5) |
| Diastolic blood pressure category: 80-89 | 11.7% (0.6) | 12.7% (1.3) | 14.4% (1.5) | 11.0% (3.4) | 25.0% (3.7) | 23.6% (2.0) |
| Diastolic blood pressure category: 90-99 | 2.8% (0.2) | 2.5% (0.5) | 2.5% (0.6) | 5.5% (3.3) | 8.0% (2.5) | 9.0% (1.0) |
| Diastolic blood pressure category: >=100 | 0.7% (0.1) | 0.7% (0.2) | 0.3% (0.2) | NA | 0.7% (0.5) | 3.2% (0.6) |
| BMI category: Underweight | 2.3% (0.2) | 1.2% (0.3) | 1.8% (0.5) | 0.4% (0.4) | NA | 0.7% (0.3) |
| BMI category: Normal weight | 36.5% (1.0) | 33.9% (2.1) | 38.5% (2.4) | 26.7% (6.6) | 16.9% (3.5) | 18.2% (1.7) |
| BMI category: Overweight | 30.3% (0.8) | 27.6% (1.6) | 29.7% (2.5) | 22.0% (5.3) | 23.8% (3.7) | 25.0% (2.1) |
| BMI category: Obesity I | 17.5% (0.6) | 19.5% (1.4) | 16.3% (1.4) | 25.3% (5.8) | 16.6% (3.2) | 25.7% (1.9) |
| BMI category: Obesity II | 7.9% (0.4) | 10.1% (1.1) | 8.3% (1.1) | 13.5% (3.0) | 16.7% (3.7) | 15.9% (1.4) |
| BMI category: Obesity III | 5.6% (0.3) | 7.7% (0.9) | 5.4% (0.9) | 12.0% (3.5) | 25.9% (3.6) | 14.6% (1.4) |
| Waist-to-height ratio category: <0.5 | 32.9% (1.1) | 31.8% (2.1) | 32.8% (2.1) | 25.3% (6.5) | 15.8% (3.9) | 15.0% (1.5) |
| Waist-to-height ratio category: 0.5-0.59 | 40.1% (0.9) | 35.3% (1.8) | 39.9% (2.0) | 34.9% (6.2) | 19.8% (3.5) | 34.8% (1.8) |
| Waist-to-height ratio category: >=0.6 | 27.1% (1.0) | 32.9% (1.9) | 27.3% (2.0) | 39.8% (6.3) | 64.4% (4.8) | 50.2% (1.8) |
| HDL cholesterol (mg/dL) | 51.20 (0.29) | 51.37 (0.60) | 51.80 (0.66) | 51.40 (2.33) | 47.78 (1.46) | 46.61 (0.67) |
| Non-HDL cholesterol (mg/dL) | 135.62 (0.63) | 132.25 (1.30) | 136.81 (1.75) | 140.36 (4.26) | 151.29 (4.03) | 147.60 (1.81) |
| Total cholesterol (mg/dL) | 186.82 (0.55) | 183.61 (1.13) | 188.61 (1.49) | 191.76 (3.97) | 199.07 (3.80) | 194.21 (1.69) |
| HbA1c (%) | 5.29 (0.01) | 5.29 (0.02) | 5.33 (0.03) | 5.40 (0.09) | 7.92 (0.21) | 5.57 (0.04) |
| Vitamin D category: <20 | 1.2% (0.2) | 1.1% (0.3) | 0.9% (0.3) | 1.4% (1.1) | 0.7% (0.5) | 2.0% (0.5) |
| Vitamin D category: 20-29 | 6.2% (0.5) | 6.1% (0.8) | 4.6% (0.8) | 5.6% (2.0) | 10.5% (2.3) | 7.0% (0.9) |
| Vitamin D category: >=30 | 92.6% (0.6) | 92.8% (0.9) | 94.6% (0.8) | 93.0% (2.5) | 88.8% (2.4) | 91.0% (1.2) |
| Dietary Inflammatory Index (DII) | 2.54 (0.06) | 2.60 (0.12) | 2.30 (0.13) | 3.29 (0.36) | 2.82 (0.24) | 2.78 (0.11) |
| DII including alcohol | 2.68 (0.06) | 2.74 (0.12) | 2.44 (0.13) | 3.49 (0.36) | 3.03 (0.24) | 2.92 (0.11) |
| 1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables. | ||||||
tbl_char_secondary| Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)1 | ||||
|---|---|---|---|---|
| Characteristic | Cohort (2005–2006) (N=1618) | Eczema (N=112) | Cohort (2007–2010) (N=3967) | Food allergy (N=404) |
| Age (years) | 30.26 (0.22) | 29.86 (0.97) | 30.07 (0.15) | 30.23 (0.26) |
| Sex: Male | 51.8% (1.2) | 39.9% (6.1) | 51.4% (0.9) | 43.9% (2.9) |
| Sex: Female | 48.2% (1.2) | 60.1% (6.1) | 48.6% (0.9) | 56.1% (2.9) |
| Race/ethnicity: Non-Hispanic White | 68.3% (2.9) | 79.5% (4.5) | 65.9% (2.8) | 63.0% (3.9) |
| Race/ethnicity: Non-Hispanic Black | 13.7% (2.1) | 16.1% (5.0) | 13.6% (1.2) | 19.4% (2.6) |
| Race/ethnicity: Mexican American | 12.7% (1.5) | 3.0% (1.1) | 13.3% (1.8) | 8.2% (1.4) |
| Race/ethnicity: Other Hispanic | 5.2% (1.3) | 1.4% (1.0) | 7.2% (1.2) | 9.4% (2.2) |
| Race/ethnicity: Other race | NA | NA | NA | NA |
| Insurance: No | 29.5% (2.4) | 21.6% (4.6) | 31.4% (1.2) | 26.2% (3.0) |
| Insurance: Yes | 70.5% (2.4) | 78.4% (4.6) | 68.6% (1.2) | 73.8% (3.0) |
| Education: <9 | 5.3% (0.9) | 0.6% (0.1) | 4.6% (0.5) | 2.7% (1.0) |
| Education: 9-12 | 35.8% (2.4) | 37.4% (6.0) | 36.7% (1.7) | 28.8% (2.9) |
| Education: >12 | 59.0% (2.8) | 61.9% (6.0) | 58.7% (1.7) | 68.5% (3.2) |
| Poverty-income ratio | 2.91 (0.06) | 3.03 (0.13) | 2.68 (0.05) | 2.74 (0.09) |
| Food security: Full | 76.7% (1.6) | 72.3% (5.1) | 86.1% (1.4) | 91.3% (2.6) |
| Food security: Marginal | 9.5% (0.9) | 13.2% (2.6) | 13.9% (1.4) | 8.7% (2.6) |
| Food security: Low | 9.2% (0.9) | 8.3% (3.5) | NA | NA |
| Food security: Very low | 4.6% (0.6) | 6.1% (2.3) | NA | NA |
| Smoking status: Never | 55.9% (1.4) | 46.9% (6.5) | 58.2% (1.7) | 57.2% (2.8) |
| Smoking status: Current | 29.5% (1.8) | 40.1% (6.2) | 27.6% (1.2) | 25.7% (2.5) |
| Smoking status: Former | 14.6% (0.8) | 13.0% (5.2) | 14.2% (1.0) | 17.2% (2.2) |
| Physical activity: Sedentary | 24.7% (1.6) | 20.7% (5.3) | 39.7% (1.3) | 36.0% (3.5) |
| Physical activity: Moderate | 24.0% (0.9) | 25.6% (5.2) | 23.8% (1.0) | 26.1% (2.7) |
| Physical activity: Intense | 51.2% (1.5) | 53.8% (6.4) | 36.5% (1.7) | 37.9% (3.4) |
| Alcohol use: No | 21.2% (1.9) | 14.9% (5.2) | 19.1% (0.9) | 22.9% (2.3) |
| Alcohol use: Yes | 78.8% (1.9) | 85.1% (5.2) | 80.9% (0.9) | 77.1% (2.3) |
| Systolic blood pressure category: <120 | 65.9% (1.9) | 65.4% (6.6) | 69.4% (1.1) | 76.4% (3.1) |
| Systolic blood pressure category: 120-129 | 21.4% (1.5) | 22.1% (3.9) | 21.3% (0.9) | 14.5% (2.6) |
| Systolic blood pressure category: 130-139 | 8.8% (0.9) | 8.9% (3.6) | 5.7% (0.4) | 6.6% (1.5) |
| Systolic blood pressure category: 140-159 | 3.7% (0.6) | 3.6% (2.0) | 3.3% (0.3) | 2.5% (0.9) |
| Systolic blood pressure category: >=160 | 0.2% (0.1) | NA | 0.3% (0.1) | NA |
| Diastolic blood pressure category: <80 | 86.9% (1.6) | 83.9% (5.9) | 84.5% (0.9) | 83.1% (2.6) |
| Diastolic blood pressure category: 80-89 | 10.1% (1.4) | 15.0% (5.8) | 12.0% (0.8) | 11.7% (2.1) |
| Diastolic blood pressure category: 90-99 | 2.6% (0.5) | 1.1% (0.7) | 2.8% (0.4) | 4.2% (1.3) |
| Diastolic blood pressure category: >=100 | 0.5% (0.2) | NA | 0.6% (0.2) | 1.0% (0.7) |
| BMI category: Underweight | 2.6% (0.6) | 1.2% (1.0) | 2.0% (0.3) | 2.0% (0.9) |
| BMI category: Normal weight | 37.7% (1.9) | 44.4% (4.6) | 35.8% (1.3) | 35.9% (3.1) |
| BMI category: Overweight | 31.0% (1.8) | 19.5% (3.4) | 30.1% (0.9) | 28.6% (2.7) |
| BMI category: Obesity I | 16.5% (1.2) | 16.0% (3.8) | 18.0% (0.8) | 18.7% (2.1) |
| BMI category: Obesity II | 7.2% (0.9) | 7.1% (2.7) | 8.3% (0.6) | 8.1% (1.6) |
| BMI category: Obesity III | 5.0% (0.6) | 11.8% (2.6) | 5.8% (0.5) | 6.8% (1.2) |
| Waist-to-height ratio category: <0.5 | 33.7% (2.1) | 34.6% (5.2) | 32.3% (1.5) | 33.6% (3.1) |
| Waist-to-height ratio category: 0.5-0.59 | 41.2% (2.1) | 34.5% (6.1) | 40.3% (1.2) | 36.8% (3.6) |
| Waist-to-height ratio category: >=0.6 | 25.1% (2.2) | 30.9% (6.0) | 27.4% (1.4) | 29.6% (3.0) |
| HDL cholesterol (mg/dL) | 52.58 (0.39) | 53.35 (1.14) | 50.53 (0.42) | 51.17 (1.33) |
| Non-HDL cholesterol (mg/dL) | 136.91 (1.38) | 133.34 (5.78) | 136.43 (0.88) | 132.23 (2.68) |
| Total cholesterol (mg/dL) | 189.50 (1.16) | 186.70 (5.20) | 186.96 (0.75) | 183.40 (2.28) |
| HbA1c (%) | 5.20 (0.02) | 5.09 (0.05) | 5.31 (0.01) | 5.28 (0.03) |
| Vitamin D category: <20 | 0.3% (0.1) | 0.4% (0.5) | 1.6% (0.3) | 2.8% (0.9) |
| Vitamin D category: 20-29 | 5.2% (0.8) | 4.2% (1.7) | 6.4% (0.7) | 9.5% (1.8) |
| Vitamin D category: >=30 | 94.5% (0.9) | 95.3% (1.8) | 92.0% (0.9) | 87.6% (2.4) |
| Dietary Inflammatory Index (DII) | 1.89 (0.06) | 2.09 (0.16) | 3.31 (0.08) | 2.98 (0.14) |
| DII including alcohol | 2.03 (0.07) | 2.22 (0.17) | 3.46 (0.09) | 3.15 (0.14) |
| 1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables. | ||||
Weighted prevalence analysis
Survey-weighted prevalence estimates were calculated for all disease outcomes, both overall and across DII quartiles (Table 4; Figure 1). Overall disease prevalence ranged from 1.3% for inflammatory arthritis to 15.9% for asthma. As compared with the weighted prevalence in the least inflammatory quartile (Q1), the weighted prevalence in the most inflammatory quartile (Q4) was significantly higher for inflammatory arthritis (2.1% vs. 0.8%; p<0.05) and significantly lower for allergic rhinitis (11.5% vs. 14.1%; p<0.05). There was a non-significant trend of higher Q4 disease prevalence for eczema and hypertension. Together, this suggested variable, modest effects of dietary inflammatory content on disease outcomes.
# Weighted prevalence by DII quartile----------------------------------------------------------
## Outcome configurations for prevalence
outcome_list_all <- c(outcome_list, outcome_list_secondary)
compute_prev_by_dii <- function(info) {
des <- info$design
y <- info$var
label_raw <- info$label
### Re-label atopic dermatitis to eczema
label <- recode(label_raw, "Atopic dermatitis" = "Eczema")
f <- as.formula(paste0("~", y))
vars <- des$variables
rows <- list()
### Overall prevalence
est_overall <- svymean(f, design = des, na.rm = TRUE)
p_overall <- as.numeric(est_overall)[1] * 100
se_overall <- as.numeric(SE(est_overall))[1] * 100
rows[[length(rows) + 1]] <- tibble(
Outcome = label,
dii_group = "Overall",
prev_pct = p_overall,
prev_se = se_overall
)
### Prevalence by DII quartile
if ("dii_quart" %in% names(vars)) {
dii_levels <- levels(vars$dii_quart)
for (lvl in dii_levels) {
des_sub <- subset(des, dii_quart == lvl)
est <- svymean(f, design = des_sub, na.rm = TRUE)
p <- as.numeric(est)[1] * 100
se <- as.numeric(SE(est))[1] * 100
rows[[length(rows) + 1]] <- tibble(
Outcome = label,
dii_group = as.character(lvl),
prev_pct = p,
prev_se = se
)
}
}
if (length(rows) == 0L) {
return(tibble(
Outcome = character(0),
dii_group = character(0),
prev_pct = numeric(0),
prev_se = numeric(0)
))
}
bind_rows(rows)
}
prev_all_long <- map_dfr(
outcome_list_all,
compute_prev_by_dii
)
## Harmonize DII group labels and ordering of outcomes
prev_all_long <- prev_all_long |>
mutate(
dii_group_simple = case_when(
dii_group == "Overall" ~ "Overall",
grepl("^Q1", dii_group) ~ "Q1",
grepl("^Q2", dii_group) ~ "Q2",
grepl("^Q3", dii_group) ~ "Q3",
grepl("^Q4", dii_group) ~ "Q4",
TRUE ~ dii_group
),
### Set base outcome order
Outcome = factor(
Outcome,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
)
)
## Compute p-values for Q2–Q4 vs. Q1 via survey-weighted logistic regression
compute_prev_pvals <- function(info) {
des <- info$design
y <- info$var
label_raw <- info$label
label <- recode(label_raw, "Atopic dermatitis" = "Eczema")
vars <- des$variables
if (!("dii_quart" %in% names(vars))) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
dq <- vars$dii_quart
if (!is.factor(dq)) {
dq <- factor(dq)
}
levels_dq <- levels(dq)
if (length(levels_dq) < 2) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
other_levels <- levels_dq[-1]
### Fit survey-weighted logistic regression
form <- as.formula(paste0(y, " ~ dii_quart"))
fit <- tryCatch(
svyglm(form, design = des, family = quasibinomial()),
error = function(e) NULL
)
if (is.null(fit)) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
coefs <- summary(fit)$coef
out_rows <- purrr::map_dfr(other_levels, function(lvl) {
term <- paste0("dii_quart", lvl)
if (!term %in% rownames(coefs)) {
p_val <- NA_real_
} else {
p_val <- coefs[term, "Pr(>|t|)"]
}
quart_simple <- case_when(
grepl("^Q1", lvl) ~ "Q1",
grepl("^Q2", lvl) ~ "Q2",
grepl("^Q3", lvl) ~ "Q3",
grepl("^Q4", lvl) ~ "Q4",
TRUE ~ lvl
)
tibble(
Outcome = label,
dii_group_simple = quart_simple,
p_vs_Q1 = as.numeric(p_val)
)
})
out_rows
}
prev_pvals <- purrr::map_dfr(
outcome_list_all,
compute_prev_pvals
)
## Join p-values to prevalence data
prev_all_long <- prev_all_long |>
left_join(
prev_pvals,
by = c("Outcome", "dii_group_simple")
)
## Color scheme for outcomes
disease_base_cols <- c(
"Asthma" = "#5E3C99",
"Allergic rhinitis" = "#89CFF0",
"Inflammatory arthritis" = "#009E73",
"Diabetes" = "#F0E442",
"Hypertension" = "#D55E00",
"Eczema" = "#E78AC3",
"Food allergy" = "#4B2E0F"
)
## Alpha for quartiles; overall uses mid alpha
alpha_quart <- c(
"Q1" = 0.4,
"Q2" = 0.6,
"Q3" = 0.8,
"Q4" = 1.0
)
alpha_overall <- 0.7
prev_all_long <- prev_all_long |>
mutate(
base_col = disease_base_cols[as.character(Outcome)],
fill_col = case_when(
dii_group_simple == "Overall" ~ scales::alpha(base_col, alpha_overall),
TRUE ~ scales::alpha(base_col, alpha_quart[dii_group_simple])
),
Outcome_pretty = forcats::fct_recode(
Outcome,
"Allergic\nrhinitis" = "Allergic rhinitis",
"Inflammatory\narthritis" = "Inflammatory arthritis"
),
### Explicit facet order
Outcome_pretty = factor(
Outcome_pretty,
levels = c(
"Asthma",
"Allergic\nrhinitis",
"Inflammatory\narthritis",
"Eczema",
"Food allergy",
"Diabetes",
"Hypertension"
)
),
x_pos = case_when(
dii_group_simple == "Overall" ~ 1.0,
dii_group_simple == "Q1" ~ 2.5,
dii_group_simple == "Q2" ~ 3.5,
dii_group_simple == "Q3" ~ 4.5,
dii_group_simple == "Q4" ~ 5.5,
TRUE ~ NA_real_
)
)
## Split data for quartiles vs. overall
prev_quart <- prev_all_long |>
filter(dii_group_simple != "Overall")
prev_overall <- prev_all_long |>
filter(dii_group_simple == "Overall")
# Weighted prevalence table--------------------------------------------------------------------
## Outcome ordering for table
prev_all_long <- prev_all_long |>
mutate(
Outcome = factor(
Outcome,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
)
)
table4_df <- prev_all_long |>
mutate(
Condition = case_when(
dii_group_simple == "Overall" ~ "Overall",
TRUE ~ dii_group
)
) |>
select(
Outcome,
Condition,
`Weighted prevalence (%)` = prev_pct,
`SE (%)` = prev_se,
`p value` = p_vs_Q1
) |>
arrange(Outcome, Condition)
nhanes_table4_gt <- table4_df |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile**")
) |>
fmt_number(
columns = c(`Weighted prevalence (%)`, `SE (%)`),
decimals = 1
) |>
fmt_number(
columns = `p value`,
decimals = 3
) |>
cols_label(
Condition = md("Condition"),
`Weighted prevalence (%)` = md("Weighted prevalence (%)"),
`SE (%)` = md("SE (%)"),
`p value` = md("p value")
) |>
tab_source_note(
md(
"P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests."
)
) |>
tab_options(
table.font.size = px(12),
data_row.padding = px(1),
row_group.padding = px(1),
column_labels.padding = px(1),
table.width = pct(80)
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = list(
cells_column_labels(everything()),
cells_row_groups()
)
)
# Weighted prevalence plot---------------------------------------------------------------------
weighted_prev_plot_all <- ggplot() +
## Quartile bars
geom_col(
data = prev_quart,
aes(
x = x_pos,
y = prev_pct,
fill = fill_col
),
width = 0.9
) +
## Overall bars
geom_col(
data = prev_overall,
aes(
x = x_pos,
y = prev_pct,
fill = fill_col,
color = base_col
),
width = 0.9,
size = 0.9
) +
facet_wrap(~ Outcome_pretty, ncol = 3, scales = "free_y") +
scale_fill_identity(guide = "none") +
scale_color_identity(guide = "none") +
scale_x_continuous(
breaks = c(1.0, 2.5, 3.5, 4.5, 5.5),
labels = c("Overall", "Q1", "Q2", "Q3", "Q4"),
expand = expansion(mult = c(0.05, 0.05))
) +
labs(
x = NULL,
y = "Weighted prevalence (%)",
title = "Figure 1: Weighted Prevalence of Allergic, Immunologic,\nCardiometabolic Conditions,\nOverall and by Dietary Inflammatory Index Quartile"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
strip.text = element_text(face = "bold", size = 12), # one size smaller
axis.title.y = element_text(face = "bold", size = 12),
axis.title.x = element_blank(),
axis.text.x = element_text(size = 11, angle = 45, hjust = 1),
axis.text.y = element_text(size = 11),
aspect.ratio = 1,
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "grey90"),
panel.grid.minor.y = element_blank()
)
# Print Table 4 and Figure 1-------------------------------------------------------------------
nhanes_table4_gt| Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile | |||
|---|---|---|---|
| Condition | Weighted prevalence (%) | SE (%) | p value |
| Asthma | |||
| Overall | 15.9 | 0.6 | NA |
| Q1 (most anti-inflammatory) | 16.1 | 1.2 | NA |
| Q2 | 14.9 | 1.1 | 0.412 |
| Q3 | 16.1 | 1.2 | 0.994 |
| Q4 (most pro-inflammatory) | 17.0 | 1.0 | 0.566 |
| Allergic rhinitis | |||
| Overall | 11.9 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 14.1 | 1.1 | NA |
| Q2 | 11.6 | 1.0 | 0.103 |
| Q3 | 11.5 | 1.0 | 0.082 |
| Q4 (most pro-inflammatory) | 11.5 | 0.8 | 0.035 |
| Inflammatory arthritis | |||
| Overall | 1.3 | 0.1 | NA |
| Q1 (most anti-inflammatory) | 0.8 | 0.3 | NA |
| Q2 | 1.6 | 0.4 | 0.063 |
| Q3 | 0.9 | 0.2 | 0.721 |
| Q4 (most pro-inflammatory) | 2.1 | 0.5 | 0.013 |
| Diabetes | |||
| Overall | 2.0 | 0.2 | NA |
| Q1 (most anti-inflammatory) | 2.1 | 0.3 | NA |
| Q2 | 1.3 | 0.3 | 0.092 |
| Q3 | 2.5 | 0.4 | 0.414 |
| Q4 (most pro-inflammatory) | 2.4 | 0.4 | 0.527 |
| Hypertension | |||
| Overall | 11.3 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 10.5 | 0.9 | NA |
| Q2 | 10.5 | 1.0 | 0.994 |
| Q3 | 11.6 | 1.0 | 0.403 |
| Q4 (most pro-inflammatory) | 13.1 | 0.9 | 0.062 |
| Eczema | |||
| Overall | 7.6 | 0.8 | NA |
| Q1 (most anti-inflammatory) | 7.2 | 1.5 | NA |
| Q2 | 6.9 | 1.2 | 0.866 |
| Q3 | 8.0 | 1.2 | 0.673 |
| Q4 (most pro-inflammatory) | 11.4 | 4.3 | 0.308 |
| Food allergy | |||
| Overall | 10.1 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 11.9 | 1.4 | NA |
| Q2 | 9.3 | 1.0 | 0.130 |
| Q3 | 11.6 | 1.2 | 0.882 |
| Q4 (most pro-inflammatory) | 9.1 | 0.8 | 0.085 |
| P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests. | |||
weighted_prev_plot_allUnivariable regression analyses
Survey-weighted univariable logistic regression analyses were performed to examine associations between core sociodemographic, behavioral, clinical, and dietary covariates (Tables 5–6) or individual nutrient quartile covariates (Tables 7–8) and disease outcomes. Results for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension are shown in Table 5 and Table 7. Results for eczema and food allergy are shown in Table 6 and Table 8.
Among the core covariates, the odds of allergic rhinitis increased with age (OR 1.06, 95% CI 1.05–1.08; p<0.001), while asthma decreased with age (OR 0.97, 95% CI 0.96–0.99; p<0.001). Female participants had higher odds of asthma (OR 1.35, 95% CI 1.17–1.56; p<0.001), allergic rhinitis (OR 1.39, 95% CI 1.19–1.61; p<0.001), and inflammatory arthritis (OR 2.31, 95% CI 1.53–3.49; p<0.001). Non-Hispanic Black (OR 0.62, 95% CI 0.50–0.78; p<0.001) and Mexican American (OR 0.35, 95% CI 0.27–0.45; p<0.001) participants had lower odds of allergic rhinitis, and Mexican American participants (OR 0.42, 95% CI 0.32–0.55; p<0.001) also had lower odds of asthma. Smoking was associated with increased odds of inflammatory arthritis (OR 2.64, 95% CI 1.63–4.27; p<0.001). Among secondary outcomes, there was a trend of higher odds of eczema associated with obesity class III (OR 2.22, 95% CI 1.24–3.97; p=0.088). Mexican American participants had lower odds of food allergy (OR 0.62, 95% CI 0.48–0.81; p=0.013). Cardiometabolic outcomes were strongly associated with age and adiposity, for example increased odds of hypertension with age (OR 1.09, 95% CI 1.07–1.10; p<0.001), obesity class III (OR 7.04, 95% CI 5.26–9.41; p<0.001), and waist-to-height ratio ≥0.6 (OR 4.87, 95% CI 3.89–6.10; p<0.001).
Regarding dietary inflammatory exposures, DII inclusive of alcohol intake was significantly associated with lower odds of allergic rhinitis (OR 0.96, 95% CI 0.92–0.99; p=0.044) and higher odds of hypertension (OR 1.05, 95% CI 1.01–1.08; p=0.049). In general, DII associations did not achieve statistical significance.
In nutrient-specific analyses, there was a large degree of variability across conditions, and many associations did not achieve statistical significance. Nevertheless, several significant associations were identified. For inflammatory arthritis, intakes of carbohydrates (Q3 OR 0.26, 95% CI 0.13–0.49; p=0.019) and total energy (Q3 OR 0.31, 95% CI 0.16–0.60; p=0.047) within the third quartile of the intake distribution were associated with lower odds of disease. Several non-significant trends were observed for cardiometabolic outcomes. For example, higher saturated fat intake (Q4 OR 1.29, 95% CI 1.04–1.60; p=0.218) and higher cholesterol intake (Q4 OR 1.39, 95% CI 1.11–1.73; p=0.095) showed a trend of increased odds of hypertension.
# Univariable analysis 1: core covariates------------------------------------------------------
## Define univariable analysis 1 predictors
uni1_predictors <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"vitD25oh_cat_trend",
"dii",
"dii_quart_trend",
"dii_etoh"
)
uni1_predictor_order <- uni1_predictors
## Helper function to clean Vitamin D and DII labels
clean_vitD_DII_labels <- function(df) {
df |>
mutate(
variable_label = as.character(variable_label),
variable_label = str_replace_all(variable_label, "25\\(OH\\)D", "Vitamin D"),
variable_label = str_replace(
variable_label,
"Vitamin D category trend \\(0–2\\)",
"Vitamin D category trend"
),
variable_label = str_replace(
variable_label,
"category trend \\(0–2\\)",
"category trend"
),
variable_label = str_replace(
variable_label,
"Dietary Inflammatory Index \\(per unit\\)",
"DII"
),
variable_label = str_replace(
variable_label,
"DII including alcohol \\(per unit\\)",
"DII including alcohol"
),
variable_label = str_replace(
variable_label,
"DII quartile trend \\(0–3\\)",
"DII quartile trend"
),
variable_label = factor(variable_label)
)
}
## Helper function for univariable tables in gt
make_uni_gt <- function(data, title_text) {
data |>
gt() |>
tab_header(
title = md(title_text)
) |>
fmt_markdown(columns = everything()) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
tab_footnote(
footnote = "p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method",
locations = cells_column_labels(columns = contains("p value"))
)
}
## Univariable analysis 1 helper function
run_uni_model1 <- function(outcome_name, outcome_info, predictor) {
des <- outcome_info$design
y <- outcome_info$var
if (!predictor %in% names(des$variables)) {
warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
return(tibble())
}
form_uni <- as.formula(paste0(y, " ~ ", predictor))
fit <- svyglm(
form_uni,
design = des,
family = quasibinomial()
)
td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term != "(Intercept)") |>
mutate(
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor
)
base_label <- uni1_labels[[predictor]]
if (is.null(base_label)) base_label <- predictor
add_ref_and_label(
td = td,
des = des,
predictor = predictor,
outcome_name = outcome_name,
outcome_info = outcome_info,
base_label = base_label
)
}
## Univariable analysis 1 primary outcomes
uni1_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
info <- outcome_list[[out_nm]]
map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
}
)
uni1_primary_long <- uni1_primary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
### Use predictor and variable_label to pull in ranks for ordering
left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
mutate(
predictor = factor(predictor, levels = uni1_predictor_order),
rank = coalesce(rank, 1000L)
) |>
clean_vitD_DII_labels() |>
### Override rank for specific predictors/levels to re-order
mutate(
rank2 = case_when(
predictor == "educ" & str_detect(variable_label, "Education: <9 years") ~ 1L,
predictor == "educ" & str_detect(variable_label, "Education: 9–12 years") ~ 2L,
predictor == "educ" & str_detect(variable_label, "Education: >12 years") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal") ~ 2L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I") ~ 4L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20") ~ 1L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120") ~ 1L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160") ~ 5L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80") ~ 1L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89") ~ 2L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99") ~ 3L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100") ~ 4L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5") ~ 1L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6") ~ 3L,
TRUE ~ rank
)
) |>
arrange(predictor, rank2, variable_label, outcome_lab)
uni1_primary_wide <- uni1_primary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni1_primary_outcomes_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
uni1_primary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni1_primary_outcomes_order, " OR (95% CI)"),
paste0(uni1_primary_outcomes_order, " p value")
))
)
uni1_primary_wide <- uni1_primary_wide |>
select(all_of(uni1_primary_col_order))
tbl_uni1_primary <- make_uni_gt(
uni1_primary_wide,
"**Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
## Univariable analysis 1 secondary outcomes
uni1_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
info <- outcome_list_secondary[[out_nm]]
map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
}
)
uni1_secondary_long <- uni1_secondary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
mutate(
predictor = factor(predictor, levels = uni1_predictor_order),
rank = coalesce(rank, 1000L)
) |>
clean_vitD_DII_labels() |>
mutate(
rank2 = case_when(
predictor == "educ" & str_detect(variable_label, "Education: <9 years") ~ 1L,
predictor == "educ" & str_detect(variable_label, "Education: 9–12 years") ~ 2L,
predictor == "educ" & str_detect(variable_label, "Education: >12 years") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal") ~ 2L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I") ~ 4L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20") ~ 1L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120") ~ 1L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160") ~ 5L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80") ~ 1L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89") ~ 2L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99") ~ 3L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100") ~ 4L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5") ~ 1L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6") ~ 3L,
TRUE ~ rank
)
) |>
arrange(predictor, rank2, variable_label, outcome_lab)
uni1_secondary_wide <- uni1_secondary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni1_secondary_outcomes_order <- c(
"Atopic dermatitis",
"Food allergy"
)
uni1_secondary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni1_secondary_outcomes_order, " OR (95% CI)"),
paste0(uni1_secondary_outcomes_order, " p value")
))
)
uni1_secondary_wide <- uni1_secondary_wide |>
select(all_of(uni1_secondary_col_order))
tbl_uni1_secondary <- make_uni_gt(
uni1_secondary_wide,
"**Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy**"
)
# Univariable modeling 2: individual DII nutrient quartiles------------------------------------
## Define univariable analysis 2 predictors
dii_nutrient_bases <- c(
"kcal", "carb", "protein", "totalfat", "satfat",
"pufa", "mufa", "n3fat", "n6fat", "choles",
"vitA", "bcarotene", "vitC", "vitE",
"vitB6", "vitB12", "folicacid",
"thiamin", "riboflavin", "niacin",
"iron", "mg", "zn", "se",
"fiber", "caffeine"
)
nutrient_labels <- c(
kcal = "Total energy intake (kcal)",
carb = "Carbohydrate intake",
protein = "Protein intake",
totalfat = "Total fat intake",
satfat = "Saturated fat intake",
pufa = "Polyunsaturated fat intake",
mufa = "Monounsaturated fat intake",
n3fat = "Omega-3 fat intake",
n6fat = "Omega-6 fat intake",
choles = "Cholesterol intake",
vitA = "Vitamin A intake",
bcarotene = "Beta-carotene intake",
vitC = "Vitamin C intake",
vitE = "Vitamin E intake",
vitB6 = "Vitamin B6 intake",
vitB12 = "Vitamin B12 intake",
folicacid = "Folic acid intake",
thiamin = "Thiamin intake",
riboflavin = "Riboflavin intake",
niacin = "Niacin intake",
iron = "Iron intake",
mg = "Magnesium intake",
zn = "Zinc intake",
se = "Selenium intake",
fiber = "Fiber intake",
caffeine = "Caffeine intake"
)
dii_nutrient_quart_predictors <- paste0(dii_nutrient_bases, "_quart")
## Univariable analysis 2 helper function
run_uni_model2 <- function(outcome_name, outcome_info, predictor) {
des <- outcome_info$design
y <- outcome_info$var
if (!predictor %in% names(des$variables)) {
warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
return(tibble())
}
form_uni <- as.formula(paste0(y, " ~ ", predictor))
fit <- svyglm(
form_uni,
design = des,
family = quasibinomial()
)
td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term != "(Intercept)") |>
mutate(
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor
)
base <- sub("_quart$", "", predictor)
base_label <- nutrient_labels[[base]]
if (is.null(base_label)) base_label <- base
var_data <- des$variables[[predictor]]
if (is.character(var_data)) {
var_data <- factor(var_data)
}
if (is.factor(var_data)) {
levels_vec <- levels(var_data)
pattern <- paste0("^", predictor)
td_nonref <- td |>
mutate(
level = sub(pattern, "", term),
variable_label = paste0(base_label, ": ", level)
)
ref_level <- levels_vec[1]
ref_label <- paste0(base_label, ": ", ref_level)
ref_row <- tibble(
term = paste0(predictor, ref_level),
estimate = 1,
conf.low = 1,
conf.high = 1,
p.value = NA_real_,
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor,
level = ref_level,
variable_label = ref_label
)
bind_rows(ref_row, td_nonref)
} else {
td |>
mutate(variable_label = base_label)
}
}
## Univariable analysis 2 primary outcomes
uni2_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
info <- outcome_list[[out_nm]]
map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
}
)
uni2_primary_long <- uni2_primary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
mutate(
base = sub("_quart$", "", predictor),
base = factor(base, levels = dii_nutrient_bases),
variable_label = factor(variable_label)
) |>
arrange(base, variable_label, outcome_lab)
uni2_primary_wide <- uni2_primary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni2_primary_outcomes_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
uni2_primary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni2_primary_outcomes_order, " OR (95% CI)"),
paste0(uni2_primary_outcomes_order, " p value")
))
)
uni2_primary_wide <- uni2_primary_wide |>
select(all_of(uni2_primary_col_order))
tbl_uni2_primary <- make_uni_gt(
uni2_primary_wide,
"**Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
## Univariable analysis 2 secondary outcomes
uni2_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
info <- outcome_list_secondary[[out_nm]]
map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
}
)
uni2_secondary_long <- uni2_secondary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
mutate(
base = sub("_quart$", "", predictor),
base = factor(base, levels = dii_nutrient_bases),
variable_label = factor(variable_label)
) |>
arrange(base, variable_label, outcome_lab)
uni2_secondary_wide <- uni2_secondary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni2_secondary_outcomes_order <- c(
"Atopic dermatitis",
"Food allergy"
)
uni2_secondary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni2_secondary_outcomes_order, " OR (95% CI)"),
paste0(uni2_secondary_outcomes_order, " p value")
))
)
uni2_secondary_wide <- uni2_secondary_wide |>
select(all_of(uni2_secondary_col_order))
tbl_uni2_secondary <- make_uni_gt(
uni2_secondary_wide,
"**Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy**"
)
# Print univariable analysis tables------------------------------------------------------------
tbl_uni1_primary| Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Asthma OR (95% CI) | Asthma p value1 | Allergic rhinitis OR (95% CI) | Allergic rhinitis p value1 | Inflammatory arthritis OR (95% CI) | Inflammatory arthritis p value1 | Diabetes OR (95% CI) | Diabetes p value1 | Hypertension OR (95% CI) | Hypertension p value1 |
| Age (years) | 0.97 (0.96, 0.99) | <0.001 | 1.06 (1.05, 1.08) | <0.001 | 1.10 (1.05, 1.15) | <0.001 | 1.11 (1.07, 1.14) | <0.001 | 1.09 (1.07, 1.10) | <0.001 |
| Sex: Male | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Sex: Female | 1.35 (1.17, 1.56) | <0.001 | 1.39 (1.19, 1.61) | <0.001 | 2.31 (1.53, 3.49) | <0.001 | 1.25 (0.85, 1.83) | 0.365 | 0.83 (0.71, 0.97) | 0.050 |
| Race/ethnicity: Non-Hispanic White | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Race/ethnicity: Non-Hispanic Black | 1.04 (0.87, 1.24) | 0.799 | 0.62 (0.50, 0.78) | <0.001 | 0.91 (0.52, 1.58) | 0.833 | 1.96 (1.30, 2.95) | 0.007 | 1.69 (1.39, 2.05) | <0.001 |
| Race/ethnicity: Mexican American | 0.42 (0.32, 0.55) | <0.001 | 0.35 (0.27, 0.45) | <0.001 | 0.50 (0.26, 0.96) | 0.088 | 1.04 (0.66, 1.64) | 0.910 | 0.80 (0.62, 1.03) | 0.151 |
| Race/ethnicity: Other Hispanic | 0.77 (0.57, 1.03) | 0.145 | 0.62 (0.42, 0.92) | 0.049 | 0.48 (0.20, 1.16) | 0.182 | 1.66 (1.00, 2.75) | 0.103 | 0.82 (0.60, 1.11) | 0.300 |
| Insurance: No | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Insurance: Yes | 1.25 (1.05, 1.50) | 0.043 | 1.46 (1.19, 1.79) | 0.002 | 2.22 (1.27, 3.89) | 0.020 | 1.40 (0.95, 2.07) | 0.155 | 1.31 (1.12, 1.54) | 0.004 |
| Education: <9 years | 0.36 (0.20, 0.65) | 0.004 | 0.44 (0.27, 0.71) | 0.004 | 1.80 (0.63, 5.11) | 0.378 | 1.47 (0.67, 3.21) | 0.453 | 0.76 (0.48, 1.20) | 0.348 |
| Education: 9–12 years | 0.95 (0.81, 1.12) | 0.690 | 0.62 (0.52, 0.75) | <0.001 | 1.28 (0.79, 2.08) | 0.424 | 1.27 (0.87, 1.87) | 0.330 | 1.12 (0.95, 1.33) | 0.273 |
| Education: >12 years | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Poverty-income ratio | 0.95 (0.90, 1.01) | 0.190 | 1.12 (1.06, 1.18) | <0.001 | 0.84 (0.73, 0.96) | 0.040 | 0.87 (0.79, 0.96) | 0.019 | 0.95 (0.90, 1.00) | 0.111 |
| Food security: Full | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Food security: Marginal | 1.14 (0.81, 1.60) | 0.599 | 0.79 (0.62, 1.02) | 0.131 | 1.98 (1.08, 3.63) | 0.068 | 1.66 (0.86, 3.21) | 0.220 | 1.46 (1.09, 1.96) | 0.036 |
| Food security: Low | 0.89 (0.59, 1.34) | 0.698 | 0.66 (0.34, 1.26) | 0.309 | 1.05 (0.29, 3.80) | 0.970 | 1.10 (0.40, 3.00) | 0.910 | 0.99 (0.61, 1.61) | 0.993 |
| Food security: Very low | 2.50 (1.75, 3.55) | <0.001 | 0.63 (0.37, 1.06) | 0.149 | 3.08 (1.07, 8.82) | 0.085 | 3.12 (0.94, 10.39) | 0.122 | 1.07 (0.49, 2.34) | 0.916 |
| Smoking status: Never | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Smoking status: Current | 1.48 (1.22, 1.79) | <0.001 | 0.96 (0.76, 1.21) | 0.833 | 2.64 (1.63, 4.27) | <0.001 | 1.04 (0.69, 1.57) | 0.910 | 1.60 (1.28, 2.00) | <0.001 |
| Smoking status: Former | 1.04 (0.81, 1.34) | 0.833 | 1.26 (0.95, 1.69) | 0.190 | 1.42 (0.70, 2.88) | 0.448 | 0.89 (0.46, 1.73) | 0.833 | 1.48 (1.14, 1.93) | 0.014 |
| Physical activity: Sedentary | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Physical activity: Moderate | 1.14 (0.94, 1.38) | 0.300 | 1.45 (1.18, 1.78) | 0.002 | 1.48 (0.90, 2.44) | 0.212 | 1.04 (0.69, 1.58) | 0.910 | 1.03 (0.83, 1.28) | 0.861 |
| Physical activity: Intense | 1.13 (0.97, 1.31) | 0.217 | 1.23 (1.02, 1.47) | 0.073 | 0.52 (0.29, 0.94) | 0.073 | 0.41 (0.27, 0.62) | <0.001 | 0.74 (0.58, 0.93) | 0.033 |
| Alcohol use: No | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Alcohol use: Yes | 1.26 (1.02, 1.55) | 0.073 | 1.21 (1.01, 1.46) | 0.095 | 0.70 (0.43, 1.13) | 0.239 | 0.63 (0.42, 0.95) | 0.068 | 1.12 (0.93, 1.36) | 0.336 |
| Systolic blood pressure category: <120 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Systolic blood pressure category: 120–129 | 1.12 (0.93, 1.36) | 0.338 | 0.91 (0.74, 1.12) | 0.489 | 0.76 (0.34, 1.72) | 0.634 | 2.88 (1.82, 4.56) | <0.001 | 2.98 (2.31, 3.83) | <0.001 |
| Systolic blood pressure category: 130–139 | 0.91 (0.70, 1.19) | 0.634 | 0.88 (0.60, 1.29) | 0.634 | 1.17 (0.57, 2.43) | 0.786 | 3.29 (1.70, 6.39) | 0.003 | 5.86 (4.21, 8.16) | <0.001 |
| Systolic blood pressure category: 140–159 | 0.94 (0.60, 1.45) | 0.843 | 0.85 (0.59, 1.24) | 0.526 | 0.00 (0.00, 0.00) | <0.001 | 2.55 (1.39, 4.68) | 0.011 | 8.90 (6.12, 12.92) | <0.001 |
| Systolic blood pressure category: ≥160 | 1.24 (0.44, 3.55) | 0.799 | 0.65 (0.18, 2.27) | 0.631 | 0.00 (0.00, 0.00) | <0.001 | 9.29 (1.92, 45.01) | 0.020 | 50.34 (21.90, 115.71) | <0.001 |
| Diastolic blood pressure category: <80 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Diastolic blood pressure category: 80–89 | 1.11 (0.89, 1.39) | 0.486 | 1.31 (0.99, 1.73) | 0.119 | 0.97 (0.48, 1.95) | 0.968 | 2.81 (1.86, 4.24) | <0.001 | 3.14 (2.48, 3.98) | <0.001 |
| Diastolic blood pressure category: 90–99 | 0.89 (0.56, 1.42) | 0.757 | 0.91 (0.57, 1.48) | 0.832 | 2.08 (0.61, 7.14) | 0.351 | 3.82 (1.91, 7.63) | 0.001 | 6.09 (4.47, 8.30) | <0.001 |
| Diastolic blood pressure category: ≥100 | 1.10 (0.58, 2.09) | 0.843 | 0.39 (0.11, 1.37) | 0.234 | 0.00 (0.00, 0.00) | <0.001 | 1.43 (0.30, 6.68) | 0.780 | 13.34 (7.50, 23.73) | <0.001 |
| BMI category: Underweight (<18.5) | 0.54 (0.30, 0.97) | 0.089 | 0.73 (0.41, 1.31) | 0.406 | 0.27 (0.03, 2.40) | 0.351 | 0.00 (0.00, 0.00) | <0.001 | 0.58 (0.23, 1.42) | 0.340 |
| BMI category: Normal (18.5–24.9) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| BMI category: Overweight (25–29.9) | 0.98 (0.82, 1.18) | 0.896 | 0.93 (0.71, 1.21) | 0.698 | 1.00 (0.41, 2.45) | 0.994 | 1.71 (0.98, 2.99) | 0.119 | 1.73 (1.29, 2.30) | 0.001 |
| BMI category: Obesity I (30–34.9) | 1.25 (0.98, 1.58) | 0.139 | 0.87 (0.69, 1.11) | 0.373 | 2.02 (0.90, 4.51) | 0.157 | 2.07 (1.10, 3.89) | 0.064 | 3.33 (2.60, 4.27) | <0.001 |
| BMI category: Obesity II (35–39.9) | 1.46 (1.09, 1.97) | 0.038 | 1.00 (0.70, 1.42) | 0.994 | 2.39 (1.28, 4.47) | 0.021 | 4.71 (2.29, 9.71) | <0.001 | 4.89 (3.71, 6.44) | <0.001 |
| BMI category: Obesity III (≥40) | 1.63 (1.19, 2.22) | 0.010 | 0.91 (0.60, 1.38) | 0.783 | 3.12 (1.34, 7.28) | 0.028 | 10.89 (6.50, 18.26) | <0.001 | 7.04 (5.26, 9.41) | <0.001 |
| Waist-to-height ratio category: <0.5 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Waist-to-height ratio category: 0.5–0.59 | 0.89 (0.74, 1.08) | 0.366 | 1.00 (0.82, 1.22) | 0.994 | 1.13 (0.52, 2.47) | 0.837 | 1.03 (0.52, 2.02) | 0.970 | 2.01 (1.56, 2.59) | <0.001 |
| Waist-to-height ratio category: ≥0.6 | 1.32 (1.08, 1.61) | 0.021 | 1.02 (0.81, 1.29) | 0.910 | 1.97 (0.96, 4.04) | 0.126 | 5.14 (2.75, 9.63) | <0.001 | 4.87 (3.89, 6.10) | <0.001 |
| HDL cholesterol (mg/dL) | 1.00 (1.00, 1.01) | 0.833 | 1.00 (1.00, 1.01) | 0.448 | 1.00 (0.98, 1.02) | 0.970 | 0.98 (0.97, 1.00) | 0.083 | 0.97 (0.96, 0.98) | <0.001 |
| Non-HDL cholesterol (mg/dL) | 1.00 (1.00, 1.00) | 0.017 | 1.00 (1.00, 1.00) | 0.534 | 1.00 (1.00, 1.01) | 0.336 | 1.01 (1.01, 1.01) | <0.001 | 1.01 (1.01, 1.01) | <0.001 |
| Total cholesterol (mg/dL) | 1.00 (1.00, 1.00) | 0.010 | 1.00 (1.00, 1.00) | 0.254 | 1.00 (1.00, 1.01) | 0.293 | 1.01 (1.00, 1.01) | 0.002 | 1.01 (1.00, 1.01) | <0.001 |
| HbA1c (%) | 0.98 (0.88, 1.10) | 0.845 | 1.08 (0.97, 1.21) | 0.272 | 1.17 (1.00, 1.37) | 0.116 | 4.52 (3.18, 6.44) | <0.001 | 1.50 (1.38, 1.63) | <0.001 |
| Vitamin D category: <20 (deficient) | 0.87 (0.51, 1.47) | 0.720 | 0.66 (0.32, 1.37) | 0.372 | 1.14 (0.26, 5.00) | 0.913 | 0.61 (0.14, 2.62) | 0.634 | 1.79 (1.05, 3.04) | 0.073 |
| Vitamin D category: 20–29 (insufficient) | 0.99 (0.72, 1.36) | 0.970 | 0.69 (0.47, 1.02) | 0.119 | 0.87 (0.41, 1.88) | 0.833 | 1.81 (1.12, 2.92) | 0.046 | 1.18 (0.87, 1.60) | 0.384 |
| Vitamin D category: ≥30 (sufficient) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin D category trend | 1.04 (0.84, 1.29) | 0.833 | 1.35 (1.03, 1.77) | 0.068 | 1.04 (0.57, 1.91) | 0.941 | 0.76 (0.54, 1.07) | 0.197 | 0.79 (0.63, 1.00) | 0.101 |
| DII | 1.01 (0.98, 1.05) | 0.606 | 0.96 (0.92, 0.99) | 0.050 | 1.15 (1.01, 1.30) | 0.089 | 1.05 (0.97, 1.14) | 0.330 | 1.05 (1.01, 1.09) | 0.050 |
| DII quartile trend | 1.03 (0.96, 1.10) | 0.606 | 0.93 (0.86, 1.00) | 0.096 | 1.30 (1.00, 1.68) | 0.101 | 1.11 (0.95, 1.30) | 0.305 | 1.09 (1.00, 1.18) | 0.101 |
| DII including alcohol | 1.01 (0.98, 1.04) | 0.634 | 0.96 (0.92, 0.99) | 0.044 | 1.15 (1.01, 1.31) | 0.072 | 1.06 (0.98, 1.14) | 0.212 | 1.05 (1.01, 1.08) | 0.049 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||||||||
tbl_uni1_secondary| Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy | ||||
|---|---|---|---|---|
| Variable | Atopic dermatitis OR (95% CI) | Atopic dermatitis p value1 | Food allergy OR (95% CI) | Food allergy p value1 |
| Age (years) | 0.99 (0.94, 1.04) | 0.837 | 1.00 (0.99, 1.02) | 0.775 |
| Sex: Male | 1 (Ref) | 1 (Ref) | ||
| Sex: Female | 1.68 (0.95, 2.98) | 0.271 | 1.40 (1.09, 1.79) | 0.088 |
| Race/ethnicity: Non-Hispanic White | 1 (Ref) | 1 (Ref) | ||
| Race/ethnicity: Non-Hispanic Black | 1.00 (0.55, 1.83) | 0.993 | 1.57 (1.15, 2.14) | 0.058 |
| Race/ethnicity: Mexican American | 0.19 (0.08, 0.44) | 0.013 | 0.62 (0.48, 0.81) | 0.013 |
| Race/ethnicity: Other Hispanic | 0.21 (0.06, 0.74) | 0.097 | 1.41 (0.94, 2.13) | 0.309 |
| Insurance: No | 1 (Ref) | 1 (Ref) | ||
| Insurance: Yes | 1.57 (0.90, 2.74) | 0.309 | 1.33 (0.94, 1.87) | 0.309 |
| Education: <9 years | 0.10 (0.06, 0.18) | <0.001 | 0.47 (0.23, 0.96) | 0.169 |
| Education: 9–12 years | 0.99 (0.54, 1.82) | 0.993 | 0.64 (0.50, 0.83) | 0.013 |
| Education: >12 years | 1 (Ref) | 1 (Ref) | ||
| Poverty-income ratio | 1.05 (0.92, 1.20) | 0.698 | 1.02 (0.95, 1.11) | 0.770 |
| Food security: Full | 1 (Ref) | 1 (Ref) | ||
| Food security: Marginal | 1.54 (0.95, 2.50) | 0.271 | 0.57 (0.28, 1.16) | 0.309 |
| Food security: Low | 0.95 (0.36, 2.54) | 0.993 | NA | NA |
| Food security: Very low | 1.46 (0.60, 3.52) | 0.659 | NA | NA |
| Smoking status: Never | 1 (Ref) | 1 (Ref) | ||
| Smoking status: Current | 1.69 (1.02, 2.79) | 0.179 | 0.94 (0.70, 1.26) | 0.837 |
| Smoking status: Former | 1.06 (0.37, 3.04) | 0.993 | 1.27 (0.88, 1.83) | 0.487 |
| Physical activity: Sedentary | 1 (Ref) | 1 (Ref) | ||
| Physical activity: Moderate | 1.30 (0.50, 3.34) | 0.775 | 1.23 (0.84, 1.80) | 0.577 |
| Physical activity: Intense | 1.28 (0.57, 2.87) | 0.770 | 1.16 (0.83, 1.61) | 0.659 |
| Alcohol use: No | 1 (Ref) | 1 (Ref) | ||
| Alcohol use: Yes | 1.58 (0.68, 3.67) | 0.563 | 0.77 (0.60, 1.00) | 0.186 |
| Systolic blood pressure category: <120 | 1 (Ref) | 1 (Ref) | ||
| Systolic blood pressure category: 120–129 | 1.05 (0.56, 1.97) | 0.988 | 0.59 (0.38, 0.91) | 0.097 |
| Systolic blood pressure category: 130–139 | 1.02 (0.37, 2.78) | 0.993 | 1.05 (0.61, 1.80) | 0.988 |
| Systolic blood pressure category: 140–159 | 0.97 (0.25, 3.75) | 0.993 | 0.67 (0.30, 1.49) | 0.617 |
| Systolic blood pressure category: ≥160 | 0.00 (0.00, 0.00) | <0.001 | 0.00 (0.00, 0.00) | <0.001 |
| Diastolic blood pressure category: <80 | 1 (Ref) | 1 (Ref) | ||
| Diastolic blood pressure category: 80–89 | 1.60 (0.60, 4.27) | 0.617 | 0.98 (0.65, 1.49) | 0.993 |
| Diastolic blood pressure category: 90–99 | 0.44 (0.10, 1.99) | 0.563 | 1.60 (0.89, 2.86) | 0.309 |
| Diastolic blood pressure category: ≥100 | 0.00 (0.00, 0.00) | <0.001 | 1.69 (0.42, 6.86) | 0.717 |
| BMI category: Underweight (<18.5) | 0.39 (0.07, 2.21) | 0.563 | 1.00 (0.35, 2.80) | 0.993 |
| BMI category: Normal (18.5–24.9) | 1 (Ref) | 1 (Ref) | ||
| BMI category: Overweight (25–29.9) | 0.51 (0.30, 0.86) | 0.097 | 0.94 (0.68, 1.30) | 0.847 |
| BMI category: Obesity I (30–34.9) | 0.81 (0.41, 1.60) | 0.755 | 1.04 (0.71, 1.52) | 0.951 |
| BMI category: Obesity II (35–39.9) | 0.81 (0.33, 2.00) | 0.801 | 0.98 (0.56, 1.69) | 0.993 |
| BMI category: Obesity III (≥40) | 2.22 (1.24, 3.97) | 0.088 | 1.20 (0.76, 1.90) | 0.698 |
| Waist-to-height ratio category: <0.5 | 1 (Ref) | 1 (Ref) | ||
| Waist-to-height ratio category: 0.5–0.59 | 0.81 (0.44, 1.49) | 0.725 | 0.87 (0.62, 1.20) | 0.659 |
| Waist-to-height ratio category: ≥0.6 | 1.22 (0.72, 2.09) | 0.698 | 1.04 (0.73, 1.49) | 0.951 |
| HDL cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.725 | 1.00 (0.99, 1.01) | 0.775 |
| Non-HDL cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.770 | 1.00 (0.99, 1.00) | 0.318 |
| Total cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.775 | 1.00 (0.99, 1.00) | 0.309 |
| HbA1c (%) | 0.71 (0.38, 1.32) | 0.563 | 0.92 (0.76, 1.11) | 0.659 |
| Vitamin D category: <20 (deficient) | 1.54 (0.13, 17.72) | 0.847 | 2.04 (0.92, 4.51) | 0.271 |
| Vitamin D category: 20–29 (insufficient) | 0.79 (0.34, 1.85) | 0.775 | 1.65 (1.11, 2.45) | 0.097 |
| Vitamin D category: ≥30 (sufficient) | 1 (Ref) | 1 (Ref) | ||
| Vitamin D category trend | 1.14 (0.59, 2.17) | 0.837 | 0.66 (0.48, 0.91) | 0.088 |
| DII | 1.07 (0.96, 1.19) | 0.543 | 0.95 (0.91, 0.99) | 0.097 |
| DII quartile trend | 1.12 (0.85, 1.47) | 0.698 | 0.93 (0.84, 1.03) | 0.384 |
| DII including alcohol | 1.06 (0.94, 1.19) | 0.617 | 0.95 (0.92, 0.99) | 0.117 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||
tbl_uni2_primary| Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Asthma OR (95% CI) | Asthma p value1 | Allergic rhinitis OR (95% CI) | Allergic rhinitis p value1 | Inflammatory arthritis OR (95% CI) | Inflammatory arthritis p value1 | Diabetes OR (95% CI) | Diabetes p value1 | Hypertension OR (95% CI) | Hypertension p value1 |
| Total energy intake (kcal): Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Total energy intake (kcal): Q2 | 0.92 (0.75, 1.14) | 0.781 | 1.27 (1.00, 1.62) | 0.366 | 0.37 (0.19, 0.73) | 0.100 | 0.83 (0.51, 1.35) | 0.781 | 1.16 (0.89, 1.53) | 0.646 |
| Total energy intake (kcal): Q3 | 0.74 (0.59, 0.93) | 0.167 | 1.05 (0.85, 1.30) | 0.856 | 0.31 (0.16, 0.60) | 0.047 | 0.78 (0.48, 1.27) | 0.655 | 1.08 (0.85, 1.39) | 0.821 |
| Total energy intake (kcal): Q4 | 1.01 (0.80, 1.26) | 0.968 | 1.22 (0.95, 1.57) | 0.524 | 0.51 (0.25, 1.01) | 0.373 | 0.66 (0.36, 1.19) | 0.555 | 1.28 (1.01, 1.61) | 0.308 |
| Carbohydrate intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Carbohydrate intake: Q2 | 0.92 (0.72, 1.18) | 0.821 | 1.35 (1.06, 1.74) | 0.213 | 0.76 (0.40, 1.45) | 0.750 | 0.63 (0.39, 1.00) | 0.371 | 1.14 (0.89, 1.46) | 0.663 |
| Carbohydrate intake: Q3 | 0.83 (0.67, 1.03) | 0.483 | 1.18 (0.92, 1.51) | 0.574 | 0.26 (0.13, 0.49) | 0.019 | 0.44 (0.25, 0.80) | 0.133 | 0.99 (0.78, 1.27) | 0.971 |
| Carbohydrate intake: Q4 | 1.04 (0.82, 1.32) | 0.883 | 1.13 (0.87, 1.47) | 0.703 | 0.73 (0.37, 1.45) | 0.725 | 0.37 (0.21, 0.68) | 0.062 | 1.12 (0.90, 1.39) | 0.653 |
| Protein intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Protein intake: Q2 | 0.84 (0.67, 1.06) | 0.531 | 1.15 (0.87, 1.53) | 0.665 | 0.49 (0.27, 0.90) | 0.237 | 1.23 (0.75, 2.01) | 0.763 | 0.99 (0.79, 1.25) | 0.968 |
| Protein intake: Q3 | 0.89 (0.70, 1.11) | 0.653 | 1.23 (0.96, 1.59) | 0.511 | 0.54 (0.28, 1.02) | 0.382 | 1.22 (0.72, 2.07) | 0.781 | 1.14 (0.90, 1.44) | 0.653 |
| Protein intake: Q4 | 0.80 (0.64, 1.02) | 0.419 | 1.24 (0.94, 1.63) | 0.524 | 0.50 (0.26, 0.94) | 0.291 | 1.15 (0.74, 1.78) | 0.821 | 1.18 (0.95, 1.46) | 0.524 |
| Total fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Total fat intake: Q2 | 1.04 (0.81, 1.34) | 0.883 | 1.03 (0.79, 1.35) | 0.900 | 0.48 (0.24, 0.95) | 0.295 | 0.69 (0.41, 1.16) | 0.555 | 0.89 (0.67, 1.17) | 0.750 |
| Total fat intake: Q3 | 0.96 (0.74, 1.25) | 0.883 | 1.09 (0.87, 1.35) | 0.781 | 0.51 (0.30, 0.87) | 0.208 | 0.77 (0.43, 1.38) | 0.727 | 1.11 (0.89, 1.39) | 0.714 |
| Total fat intake: Q4 | 1.06 (0.83, 1.35) | 0.856 | 1.20 (0.94, 1.53) | 0.524 | 0.60 (0.29, 1.21) | 0.534 | 1.23 (0.73, 2.07) | 0.781 | 1.30 (1.02, 1.66) | 0.295 |
| Saturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Saturated fat intake: Q2 | 0.94 (0.77, 1.14) | 0.816 | 1.09 (0.86, 1.39) | 0.781 | 0.76 (0.38, 1.54) | 0.781 | 0.73 (0.43, 1.23) | 0.610 | 1.01 (0.78, 1.31) | 0.971 |
| Saturated fat intake: Q3 | 0.96 (0.76, 1.23) | 0.883 | 1.12 (0.88, 1.42) | 0.703 | 0.64 (0.33, 1.25) | 0.579 | 0.80 (0.50, 1.28) | 0.703 | 1.01 (0.79, 1.29) | 0.968 |
| Saturated fat intake: Q4 | 0.97 (0.76, 1.24) | 0.898 | 1.21 (0.94, 1.56) | 0.531 | 0.82 (0.42, 1.57) | 0.821 | 0.99 (0.59, 1.66) | 0.971 | 1.29 (1.04, 1.60) | 0.218 |
| Polyunsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Polyunsaturated fat intake: Q2 | 0.94 (0.74, 1.20) | 0.856 | 0.96 (0.75, 1.23) | 0.877 | 0.73 (0.36, 1.48) | 0.740 | 0.86 (0.47, 1.56) | 0.852 | 0.94 (0.77, 1.16) | 0.844 |
| Polyunsaturated fat intake: Q3 | 0.97 (0.79, 1.19) | 0.883 | 0.90 (0.70, 1.16) | 0.750 | 0.86 (0.39, 1.90) | 0.877 | 0.66 (0.39, 1.11) | 0.524 | 0.97 (0.77, 1.21) | 0.883 |
| Polyunsaturated fat intake: Q4 | 1.16 (0.95, 1.43) | 0.531 | 0.96 (0.77, 1.19) | 0.875 | 1.73 (0.86, 3.51) | 0.524 | 0.89 (0.53, 1.50) | 0.860 | 0.85 (0.67, 1.08) | 0.569 |
| Monounsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Monounsaturated fat intake: Q2 | 0.88 (0.72, 1.09) | 0.610 | 0.84 (0.65, 1.09) | 0.578 | 0.64 (0.32, 1.29) | 0.600 | 0.62 (0.35, 1.11) | 0.511 | 0.87 (0.68, 1.10) | 0.614 |
| Monounsaturated fat intake: Q3 | 1.08 (0.89, 1.31) | 0.781 | 0.86 (0.66, 1.12) | 0.638 | 0.71 (0.39, 1.32) | 0.646 | 0.49 (0.28, 0.83) | 0.148 | 0.68 (0.53, 0.87) | 0.082 |
| Monounsaturated fat intake: Q4 | 1.03 (0.82, 1.29) | 0.898 | 0.83 (0.64, 1.06) | 0.524 | 1.58 (0.84, 2.95) | 0.531 | 0.71 (0.42, 1.20) | 0.586 | 0.81 (0.65, 1.01) | 0.407 |
| Omega-3 fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Omega-3 fat intake: Q2 | 0.98 (0.79, 1.21) | 0.900 | 1.07 (0.86, 1.32) | 0.834 | 1.61 (0.73, 3.54) | 0.610 | 1.10 (0.61, 1.97) | 0.883 | 0.95 (0.77, 1.16) | 0.852 |
| Omega-3 fat intake: Q3 | 0.90 (0.74, 1.10) | 0.653 | 0.98 (0.78, 1.23) | 0.916 | 0.76 (0.32, 1.83) | 0.821 | 1.26 (0.70, 2.28) | 0.781 | 1.02 (0.85, 1.24) | 0.898 |
| Omega-3 fat intake: Q4 | 1.11 (0.92, 1.35) | 0.646 | 0.86 (0.67, 1.11) | 0.614 | 2.88 (1.29, 6.41) | 0.157 | 1.46 (0.89, 2.38) | 0.524 | 0.96 (0.76, 1.21) | 0.877 |
| Omega-6 fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Omega-6 fat intake: Q2 | 0.94 (0.73, 1.21) | 0.856 | 0.97 (0.75, 1.24) | 0.892 | 0.81 (0.38, 1.75) | 0.852 | 0.82 (0.45, 1.49) | 0.821 | 0.84 (0.67, 1.04) | 0.524 |
| Omega-6 fat intake: Q3 | 0.98 (0.78, 1.22) | 0.913 | 0.86 (0.68, 1.09) | 0.600 | 0.95 (0.46, 1.98) | 0.943 | 0.68 (0.41, 1.14) | 0.531 | 0.90 (0.71, 1.14) | 0.725 |
| Omega-6 fat intake: Q4 | 1.13 (0.91, 1.40) | 0.647 | 0.99 (0.79, 1.23) | 0.957 | 1.69 (0.85, 3.37) | 0.524 | 0.86 (0.52, 1.43) | 0.834 | 0.84 (0.67, 1.06) | 0.527 |
| Cholesterol intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Cholesterol intake: Q2 | 0.99 (0.78, 1.26) | 0.968 | 1.15 (0.89, 1.48) | 0.653 | 0.71 (0.39, 1.29) | 0.634 | 0.86 (0.44, 1.68) | 0.861 | 0.96 (0.74, 1.26) | 0.893 |
| Cholesterol intake: Q3 | 0.83 (0.67, 1.03) | 0.483 | 0.97 (0.78, 1.22) | 0.898 | 0.61 (0.31, 1.20) | 0.531 | 1.26 (0.73, 2.18) | 0.748 | 1.19 (0.95, 1.50) | 0.524 |
| Cholesterol intake: Q4 | 0.81 (0.63, 1.03) | 0.483 | 1.07 (0.82, 1.39) | 0.852 | 0.63 (0.33, 1.22) | 0.555 | 1.30 (0.75, 2.27) | 0.703 | 1.39 (1.11, 1.73) | 0.095 |
| Vitamin A intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin A intake: Q2 | 0.92 (0.74, 1.16) | 0.808 | 0.82 (0.65, 1.04) | 0.511 | 0.91 (0.44, 1.87) | 0.896 | 1.47 (0.84, 2.57) | 0.569 | 1.08 (0.83, 1.40) | 0.842 |
| Vitamin A intake: Q3 | 0.80 (0.65, 1.00) | 0.373 | 0.72 (0.57, 0.91) | 0.125 | 0.90 (0.45, 1.82) | 0.886 | 1.91 (1.13, 3.26) | 0.213 | 1.16 (0.88, 1.54) | 0.653 |
| Vitamin A intake: Q4 | 1.05 (0.84, 1.31) | 0.856 | 0.60 (0.46, 0.77) | 0.019 | 1.22 (0.58, 2.55) | 0.852 | 1.66 (0.86, 3.19) | 0.524 | 1.23 (0.95, 1.60) | 0.524 |
| Beta-carotene intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Beta-carotene intake: Q2 | 1.02 (0.83, 1.26) | 0.902 | 0.82 (0.64, 1.05) | 0.524 | 0.76 (0.39, 1.45) | 0.750 | 1.60 (0.93, 2.75) | 0.481 | 1.08 (0.85, 1.38) | 0.821 |
| Beta-carotene intake: Q3 | 1.06 (0.83, 1.36) | 0.856 | 0.65 (0.51, 0.83) | 0.047 | 1.47 (0.72, 3.02) | 0.653 | 1.54 (0.85, 2.78) | 0.531 | 1.25 (0.97, 1.61) | 0.477 |
| Beta-carotene intake: Q4 | 1.11 (0.89, 1.39) | 0.703 | 0.70 (0.56, 0.88) | 0.081 | 2.05 (1.05, 3.99) | 0.295 | 1.41 (0.82, 2.41) | 0.596 | 1.34 (1.02, 1.76) | 0.295 |
| Vitamin C intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin C intake: Q2 | 1.17 (0.92, 1.49) | 0.590 | 1.27 (1.03, 1.56) | 0.250 | 1.20 (0.52, 2.80) | 0.861 | 1.44 (0.89, 2.33) | 0.528 | 1.09 (0.83, 1.44) | 0.821 |
| Vitamin C intake: Q3 | 1.23 (1.02, 1.48) | 0.291 | 0.93 (0.69, 1.26) | 0.856 | 1.92 (0.99, 3.74) | 0.373 | 1.06 (0.60, 1.88) | 0.900 | 1.14 (0.86, 1.51) | 0.713 |
| Vitamin C intake: Q4 | 1.23 (1.00, 1.51) | 0.371 | 0.90 (0.68, 1.19) | 0.781 | 2.46 (1.12, 5.40) | 0.255 | 1.18 (0.69, 2.00) | 0.821 | 1.31 (0.98, 1.75) | 0.429 |
| Vitamin E intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin E intake: Q2 | 0.82 (0.64, 1.05) | 0.524 | 0.97 (0.78, 1.21) | 0.898 | 0.93 (0.50, 1.73) | 0.900 | 0.56 (0.31, 1.01) | 0.373 | 1.08 (0.80, 1.46) | 0.852 |
| Vitamin E intake: Q3 | 0.94 (0.74, 1.20) | 0.856 | 0.82 (0.65, 1.03) | 0.483 | 1.01 (0.53, 1.94) | 0.981 | 0.83 (0.46, 1.50) | 0.821 | 1.10 (0.85, 1.42) | 0.781 |
| Vitamin E intake: Q4 | 0.95 (0.76, 1.20) | 0.871 | 0.67 (0.54, 0.82) | 0.024 | 1.53 (0.77, 3.04) | 0.602 | 1.10 (0.69, 1.77) | 0.868 | 1.15 (0.89, 1.48) | 0.646 |
| Vitamin B6 intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin B6 intake: Q2 | 0.93 (0.78, 1.11) | 0.745 | 0.97 (0.77, 1.22) | 0.898 | 0.77 (0.35, 1.67) | 0.811 | 1.18 (0.70, 1.99) | 0.821 | 0.88 (0.68, 1.13) | 0.653 |
| Vitamin B6 intake: Q3 | 0.96 (0.78, 1.18) | 0.877 | 0.90 (0.70, 1.16) | 0.759 | 1.51 (0.78, 2.89) | 0.600 | 1.24 (0.77, 1.99) | 0.727 | 1.04 (0.84, 1.29) | 0.875 |
| Vitamin B6 intake: Q4 | 1.10 (0.89, 1.36) | 0.725 | 0.82 (0.66, 1.03) | 0.477 | 2.27 (1.13, 4.56) | 0.242 | 1.17 (0.65, 2.08) | 0.852 | 1.15 (0.91, 1.45) | 0.610 |
| Vitamin B12 intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin B12 intake: Q2 | 1.02 (0.82, 1.27) | 0.916 | 1.11 (0.90, 1.38) | 0.686 | 0.87 (0.47, 1.59) | 0.856 | 0.74 (0.44, 1.22) | 0.610 | 1.01 (0.75, 1.35) | 0.974 |
| Vitamin B12 intake: Q3 | 0.87 (0.69, 1.09) | 0.600 | 1.03 (0.78, 1.36) | 0.898 | 1.07 (0.59, 1.93) | 0.898 | 0.67 (0.40, 1.12) | 0.524 | 0.93 (0.72, 1.19) | 0.821 |
| Vitamin B12 intake: Q4 | 1.03 (0.83, 1.28) | 0.898 | 1.20 (0.94, 1.52) | 0.524 | 0.59 (0.27, 1.32) | 0.585 | 0.76 (0.42, 1.36) | 0.703 | 1.15 (0.90, 1.48) | 0.646 |
| Folic acid intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Folic acid intake: Q2 | 1.28 (1.03, 1.58) | 0.250 | 0.92 (0.72, 1.16) | 0.787 | 0.96 (0.46, 2.01) | 0.957 | 0.80 (0.51, 1.23) | 0.653 | 1.20 (0.92, 1.57) | 0.559 |
| Folic acid intake: Q3 | 1.05 (0.85, 1.30) | 0.856 | 0.91 (0.72, 1.15) | 0.781 | 1.33 (0.63, 2.82) | 0.781 | 1.22 (0.72, 2.09) | 0.781 | 1.21 (0.96, 1.53) | 0.509 |
| Folic acid intake: Q4 | 1.23 (0.97, 1.55) | 0.477 | 0.98 (0.77, 1.24) | 0.907 | 1.61 (0.79, 3.31) | 0.579 | 1.23 (0.70, 2.15) | 0.786 | 1.44 (1.16, 1.80) | 0.058 |
| Thiamin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Thiamin intake: Q2 | 0.96 (0.75, 1.23) | 0.883 | 0.93 (0.73, 1.19) | 0.842 | 1.35 (0.58, 3.13) | 0.793 | 1.03 (0.67, 1.60) | 0.935 | 1.04 (0.84, 1.28) | 0.877 |
| Thiamin intake: Q3 | 0.87 (0.70, 1.08) | 0.589 | 0.96 (0.76, 1.20) | 0.877 | 1.21 (0.54, 2.68) | 0.856 | 0.96 (0.57, 1.61) | 0.918 | 1.07 (0.86, 1.32) | 0.821 |
| Thiamin intake: Q4 | 1.17 (0.92, 1.49) | 0.585 | 0.81 (0.64, 1.03) | 0.477 | 2.69 (1.20, 6.01) | 0.213 | 1.12 (0.62, 2.03) | 0.875 | 1.05 (0.85, 1.30) | 0.860 |
| Riboflavin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Riboflavin intake: Q2 | 0.82 (0.67, 1.01) | 0.376 | 0.85 (0.67, 1.07) | 0.555 | 1.04 (0.51, 2.13) | 0.957 | 1.05 (0.69, 1.60) | 0.898 | 1.05 (0.85, 1.30) | 0.859 |
| Riboflavin intake: Q3 | 0.96 (0.75, 1.22) | 0.877 | 0.90 (0.73, 1.10) | 0.653 | 1.28 (0.63, 2.61) | 0.811 | 1.03 (0.57, 1.86) | 0.958 | 1.04 (0.81, 1.34) | 0.883 |
| Riboflavin intake: Q4 | 1.03 (0.84, 1.27) | 0.892 | 0.61 (0.46, 0.80) | 0.047 | 1.60 (0.80, 3.19) | 0.571 | 1.10 (0.64, 1.88) | 0.877 | 1.05 (0.80, 1.37) | 0.883 |
| Niacin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Niacin intake: Q2 | 1.05 (0.83, 1.33) | 0.860 | 1.08 (0.85, 1.37) | 0.821 | 0.82 (0.38, 1.78) | 0.856 | 1.58 (0.93, 2.69) | 0.483 | 1.07 (0.83, 1.37) | 0.852 |
| Niacin intake: Q3 | 1.02 (0.82, 1.26) | 0.935 | 1.16 (0.93, 1.45) | 0.571 | 1.50 (0.76, 2.94) | 0.614 | 1.42 (0.78, 2.58) | 0.614 | 1.03 (0.85, 1.24) | 0.893 |
| Niacin intake: Q4 | 1.16 (0.94, 1.43) | 0.544 | 0.94 (0.73, 1.20) | 0.852 | 1.41 (0.74, 2.68) | 0.653 | 1.11 (0.63, 1.96) | 0.877 | 1.02 (0.83, 1.24) | 0.916 |
| Iron intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Iron intake: Q2 | 0.85 (0.70, 1.03) | 0.483 | 1.01 (0.80, 1.28) | 0.957 | 0.73 (0.43, 1.24) | 0.614 | 1.00 (0.58, 1.73) | 0.996 | 1.04 (0.83, 1.31) | 0.877 |
| Iron intake: Q3 | 0.88 (0.71, 1.10) | 0.638 | 1.13 (0.88, 1.44) | 0.686 | 0.44 (0.22, 0.90) | 0.255 | 0.78 (0.43, 1.41) | 0.759 | 1.01 (0.82, 1.25) | 0.962 |
| Iron intake: Q4 | 0.84 (0.67, 1.05) | 0.524 | 1.20 (0.93, 1.56) | 0.544 | 0.57 (0.28, 1.19) | 0.524 | 0.89 (0.48, 1.65) | 0.877 | 0.92 (0.74, 1.15) | 0.787 |
| Magnesium intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Magnesium intake: Q2 | 0.91 (0.72, 1.16) | 0.781 | 0.95 (0.72, 1.26) | 0.877 | 1.28 (0.54, 3.05) | 0.842 | 0.85 (0.43, 1.69) | 0.856 | 0.96 (0.74, 1.25) | 0.883 |
| Magnesium intake: Q3 | 0.94 (0.75, 1.17) | 0.844 | 0.87 (0.69, 1.09) | 0.600 | 1.79 (0.83, 3.88) | 0.524 | 1.16 (0.69, 1.95) | 0.844 | 0.95 (0.76, 1.19) | 0.868 |
| Magnesium intake: Q4 | 1.16 (0.93, 1.44) | 0.571 | 0.72 (0.55, 0.94) | 0.213 | 2.88 (1.34, 6.18) | 0.131 | 1.23 (0.70, 2.14) | 0.786 | 1.04 (0.80, 1.36) | 0.883 |
| Zinc intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Zinc intake: Q2 | 0.94 (0.75, 1.18) | 0.852 | 1.03 (0.82, 1.30) | 0.883 | 0.96 (0.46, 2.01) | 0.957 | 0.75 (0.39, 1.45) | 0.741 | 0.66 (0.51, 0.86) | 0.082 |
| Zinc intake: Q3 | 1.14 (0.90, 1.43) | 0.646 | 1.08 (0.85, 1.36) | 0.821 | 0.80 (0.38, 1.68) | 0.826 | 0.89 (0.45, 1.75) | 0.877 | 0.81 (0.67, 0.99) | 0.295 |
| Zinc intake: Q4 | 1.09 (0.87, 1.37) | 0.781 | 0.91 (0.70, 1.18) | 0.781 | 1.53 (0.77, 3.02) | 0.602 | 1.13 (0.66, 1.94) | 0.859 | 0.88 (0.71, 1.09) | 0.608 |
| Selenium intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Selenium intake: Q2 | 1.23 (0.99, 1.53) | 0.391 | 1.17 (0.89, 1.52) | 0.632 | 1.17 (0.54, 2.54) | 0.875 | 1.37 (0.79, 2.37) | 0.634 | 1.05 (0.82, 1.33) | 0.877 |
| Selenium intake: Q3 | 1.10 (0.89, 1.36) | 0.725 | 0.88 (0.70, 1.11) | 0.653 | 1.05 (0.51, 2.18) | 0.935 | 1.47 (0.89, 2.42) | 0.524 | 0.97 (0.78, 1.20) | 0.883 |
| Selenium intake: Q4 | 1.39 (1.11, 1.73) | 0.100 | 0.97 (0.77, 1.23) | 0.896 | 2.11 (1.07, 4.14) | 0.291 | 1.09 (0.63, 1.87) | 0.883 | 0.87 (0.69, 1.11) | 0.632 |
| Fiber intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Fiber intake: Q2 | 0.95 (0.75, 1.20) | 0.870 | 0.92 (0.72, 1.19) | 0.821 | 1.25 (0.50, 3.18) | 0.856 | 1.00 (0.58, 1.71) | 0.990 | 1.18 (0.92, 1.53) | 0.579 |
| Fiber intake: Q3 | 1.00 (0.80, 1.27) | 0.978 | 0.80 (0.65, 0.98) | 0.295 | 1.23 (0.52, 2.91) | 0.856 | 1.17 (0.67, 2.05) | 0.842 | 1.09 (0.85, 1.41) | 0.811 |
| Fiber intake: Q4 | 1.18 (0.92, 1.53) | 0.579 | 0.69 (0.54, 0.88) | 0.094 | 3.02 (1.32, 6.94) | 0.156 | 1.19 (0.68, 2.09) | 0.821 | 1.45 (1.16, 1.82) | 0.058 |
| Caffeine intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Caffeine intake: Q2 | 0.78 (0.64, 0.96) | 0.213 | 0.83 (0.66, 1.04) | 0.509 | 0.38 (0.19, 0.75) | 0.125 | 0.87 (0.48, 1.57) | 0.856 | 0.85 (0.65, 1.11) | 0.603 |
| Caffeine intake: Q3 | 0.72 (0.58, 0.89) | 0.081 | 0.47 (0.37, 0.60) | <0.001 | 0.73 (0.41, 1.29) | 0.646 | 0.98 (0.59, 1.63) | 0.963 | 0.88 (0.65, 1.18) | 0.734 |
| Caffeine intake: Q4 | 0.95 (0.76, 1.18) | 0.856 | 0.64 (0.50, 0.83) | 0.047 | 0.46 (0.22, 0.96) | 0.295 | 1.21 (0.72, 2.04) | 0.786 | 0.88 (0.69, 1.12) | 0.653 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||||||||
tbl_uni2_secondary| Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy | ||||
|---|---|---|---|---|
| Variable | Atopic dermatitis OR (95% CI) | Atopic dermatitis p value1 | Food allergy OR (95% CI) | Food allergy p value1 |
| Total energy intake (kcal): Q1 | 1 (Ref) | 1 (Ref) | ||
| Total energy intake (kcal): Q2 | 1.34 (0.71, 2.52) | 0.905 | 1.09 (0.69, 1.71) | 0.921 |
| Total energy intake (kcal): Q3 | 0.85 (0.43, 1.67) | 0.921 | 1.05 (0.74, 1.49) | 0.921 |
| Total energy intake (kcal): Q4 | 0.78 (0.45, 1.36) | 0.905 | 1.04 (0.69, 1.57) | 0.939 |
| Carbohydrate intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Carbohydrate intake: Q2 | 1.42 (0.57, 3.51) | 0.909 | 1.13 (0.76, 1.66) | 0.921 |
| Carbohydrate intake: Q3 | 1.36 (0.53, 3.47) | 0.921 | 1.00 (0.63, 1.57) | 0.990 |
| Carbohydrate intake: Q4 | 0.84 (0.51, 1.39) | 0.921 | 1.16 (0.79, 1.71) | 0.909 |
| Protein intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Protein intake: Q2 | 0.95 (0.45, 2.04) | 0.942 | 1.28 (0.85, 1.92) | 0.892 |
| Protein intake: Q3 | 0.76 (0.33, 1.72) | 0.921 | 1.42 (0.98, 2.05) | 0.727 |
| Protein intake: Q4 | 0.67 (0.36, 1.25) | 0.892 | 0.92 (0.61, 1.38) | 0.921 |
| Total fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Total fat intake: Q2 | 0.74 (0.30, 1.85) | 0.921 | 1.11 (0.71, 1.75) | 0.921 |
| Total fat intake: Q3 | 0.93 (0.44, 1.93) | 0.937 | 1.08 (0.68, 1.72) | 0.921 |
| Total fat intake: Q4 | 0.62 (0.38, 1.04) | 0.727 | 1.11 (0.76, 1.60) | 0.921 |
| Saturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Saturated fat intake: Q2 | 0.77 (0.39, 1.52) | 0.909 | 1.04 (0.76, 1.43) | 0.921 |
| Saturated fat intake: Q3 | 0.87 (0.47, 1.62) | 0.921 | 1.17 (0.76, 1.80) | 0.921 |
| Saturated fat intake: Q4 | 0.74 (0.42, 1.30) | 0.905 | 0.86 (0.59, 1.26) | 0.909 |
| Polyunsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Polyunsaturated fat intake: Q2 | 0.65 (0.31, 1.34) | 0.892 | 0.85 (0.60, 1.21) | 0.905 |
| Polyunsaturated fat intake: Q3 | 1.05 (0.51, 2.16) | 0.942 | 0.91 (0.64, 1.31) | 0.921 |
| Polyunsaturated fat intake: Q4 | 1.16 (0.67, 1.99) | 0.921 | 0.71 (0.53, 0.96) | 0.641 |
| Monounsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Monounsaturated fat intake: Q2 | 1.52 (0.73, 3.16) | 0.892 | 1.09 (0.70, 1.69) | 0.921 |
| Monounsaturated fat intake: Q3 | 1.50 (0.75, 2.97) | 0.892 | 1.16 (0.76, 1.76) | 0.921 |
| Monounsaturated fat intake: Q4 | 1.51 (1.00, 2.29) | 0.669 | 0.98 (0.69, 1.38) | 0.942 |
| Omega-3 fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Omega-3 fat intake: Q2 | 0.74 (0.30, 1.85) | 0.921 | 1.18 (0.91, 1.54) | 0.892 |
| Omega-3 fat intake: Q3 | 1.33 (0.70, 2.51) | 0.905 | 0.94 (0.69, 1.30) | 0.921 |
| Omega-3 fat intake: Q4 | 1.21 (0.77, 1.90) | 0.905 | 0.93 (0.63, 1.36) | 0.921 |
| Omega-6 fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Omega-6 fat intake: Q2 | 0.90 (0.46, 1.75) | 0.921 | 0.77 (0.55, 1.07) | 0.802 |
| Omega-6 fat intake: Q3 | 1.24 (0.63, 2.42) | 0.921 | 0.82 (0.55, 1.20) | 0.905 |
| Omega-6 fat intake: Q4 | 1.35 (0.80, 2.29) | 0.892 | 0.72 (0.55, 0.96) | 0.641 |
| Cholesterol intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Cholesterol intake: Q2 | 0.81 (0.37, 1.73) | 0.921 | 1.34 (0.87, 2.06) | 0.892 |
| Cholesterol intake: Q3 | 0.63 (0.30, 1.35) | 0.892 | 0.95 (0.64, 1.40) | 0.921 |
| Cholesterol intake: Q4 | 0.70 (0.41, 1.19) | 0.892 | 1.11 (0.72, 1.71) | 0.921 |
| Vitamin A intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin A intake: Q2 | 0.87 (0.49, 1.55) | 0.921 | 0.86 (0.61, 1.21) | 0.907 |
| Vitamin A intake: Q3 | 1.09 (0.62, 1.91) | 0.921 | 0.89 (0.60, 1.33) | 0.921 |
| Vitamin A intake: Q4 | 1.01 (0.50, 2.01) | 0.990 | 0.95 (0.67, 1.35) | 0.921 |
| Beta-carotene intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Beta-carotene intake: Q2 | 1.40 (0.76, 2.60) | 0.905 | 0.69 (0.48, 0.99) | 0.669 |
| Beta-carotene intake: Q3 | 0.83 (0.36, 1.91) | 0.921 | 0.82 (0.66, 1.04) | 0.802 |
| Beta-carotene intake: Q4 | 1.21 (0.62, 2.35) | 0.921 | 0.73 (0.51, 1.04) | 0.727 |
| Vitamin C intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin C intake: Q2 | 1.63 (0.70, 3.83) | 0.892 | 0.75 (0.52, 1.07) | 0.802 |
| Vitamin C intake: Q3 | 1.31 (0.55, 3.11) | 0.921 | 0.66 (0.44, 0.98) | 0.669 |
| Vitamin C intake: Q4 | 1.26 (0.74, 2.15) | 0.905 | 0.80 (0.52, 1.22) | 0.905 |
| Vitamin E intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin E intake: Q2 | 0.92 (0.47, 1.81) | 0.921 | 1.02 (0.72, 1.45) | 0.942 |
| Vitamin E intake: Q3 | 1.04 (0.50, 2.18) | 0.942 | 0.94 (0.65, 1.37) | 0.921 |
| Vitamin E intake: Q4 | 0.85 (0.46, 1.56) | 0.921 | 0.71 (0.52, 0.97) | 0.641 |
| Vitamin B6 intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin B6 intake: Q2 | 0.88 (0.36, 2.14) | 0.921 | 0.98 (0.69, 1.38) | 0.942 |
| Vitamin B6 intake: Q3 | 1.50 (0.83, 2.72) | 0.892 | 1.33 (0.89, 1.99) | 0.892 |
| Vitamin B6 intake: Q4 | 1.34 (0.58, 3.08) | 0.921 | 1.08 (0.68, 1.73) | 0.921 |
| Vitamin B12 intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin B12 intake: Q2 | 0.77 (0.33, 1.83) | 0.921 | 0.93 (0.68, 1.26) | 0.921 |
| Vitamin B12 intake: Q3 | 0.64 (0.28, 1.48) | 0.905 | 0.83 (0.59, 1.19) | 0.905 |
| Vitamin B12 intake: Q4 | 0.99 (0.46, 2.13) | 0.990 | 0.61 (0.39, 0.95) | 0.641 |
| Folic acid intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Folic acid intake: Q2 | 0.66 (0.39, 1.13) | 0.802 | 1.59 (1.08, 2.35) | 0.641 |
| Folic acid intake: Q3 | 0.81 (0.45, 1.46) | 0.921 | 1.11 (0.76, 1.60) | 0.921 |
| Folic acid intake: Q4 | 1.09 (0.55, 2.16) | 0.921 | 1.47 (1.04, 2.08) | 0.641 |
| Thiamin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Thiamin intake: Q2 | 1.12 (0.55, 2.27) | 0.921 | 0.94 (0.64, 1.39) | 0.921 |
| Thiamin intake: Q3 | 1.04 (0.68, 1.59) | 0.942 | 1.16 (0.82, 1.63) | 0.909 |
| Thiamin intake: Q4 | 1.12 (0.49, 2.59) | 0.921 | 1.04 (0.73, 1.49) | 0.937 |
| Riboflavin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Riboflavin intake: Q2 | 1.44 (0.79, 2.62) | 0.892 | 0.90 (0.66, 1.23) | 0.921 |
| Riboflavin intake: Q3 | 0.79 (0.42, 1.49) | 0.920 | 1.05 (0.71, 1.56) | 0.921 |
| Riboflavin intake: Q4 | 1.43 (0.64, 3.22) | 0.905 | 1.00 (0.72, 1.39) | 0.996 |
| Niacin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Niacin intake: Q2 | 1.10 (0.65, 1.86) | 0.921 | 1.26 (0.87, 1.81) | 0.892 |
| Niacin intake: Q3 | 1.18 (0.81, 1.73) | 0.905 | 1.15 (0.82, 1.64) | 0.909 |
| Niacin intake: Q4 | 1.27 (0.68, 2.36) | 0.909 | 1.05 (0.72, 1.55) | 0.921 |
| Iron intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Iron intake: Q2 | 0.98 (0.45, 2.14) | 0.981 | 1.30 (0.89, 1.88) | 0.892 |
| Iron intake: Q3 | 0.88 (0.38, 2.04) | 0.921 | 1.23 (0.94, 1.61) | 0.802 |
| Iron intake: Q4 | 0.71 (0.36, 1.41) | 0.905 | 1.06 (0.72, 1.55) | 0.921 |
| Magnesium intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Magnesium intake: Q2 | 0.75 (0.45, 1.25) | 0.892 | 0.85 (0.56, 1.27) | 0.909 |
| Magnesium intake: Q3 | 0.78 (0.49, 1.25) | 0.905 | 0.91 (0.66, 1.27) | 0.921 |
| Magnesium intake: Q4 | 1.13 (0.53, 2.39) | 0.921 | 0.68 (0.49, 0.95) | 0.641 |
| Zinc intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Zinc intake: Q2 | 0.75 (0.41, 1.37) | 0.905 | 1.41 (0.93, 2.13) | 0.802 |
| Zinc intake: Q3 | 1.03 (0.62, 1.73) | 0.942 | 1.25 (0.80, 1.95) | 0.905 |
| Zinc intake: Q4 | 1.02 (0.50, 2.07) | 0.981 | 1.28 (0.87, 1.89) | 0.892 |
| Selenium intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Selenium intake: Q2 | 1.35 (0.75, 2.44) | 0.905 | 1.02 (0.72, 1.45) | 0.942 |
| Selenium intake: Q3 | 1.69 (1.00, 2.87) | 0.669 | 0.94 (0.60, 1.47) | 0.921 |
| Selenium intake: Q4 | 1.14 (0.59, 2.23) | 0.921 | 1.02 (0.74, 1.42) | 0.942 |
| Fiber intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Fiber intake: Q2 | 1.53 (0.69, 3.36) | 0.905 | 0.71 (0.48, 1.04) | 0.727 |
| Fiber intake: Q3 | 1.21 (0.50, 2.92) | 0.921 | 0.72 (0.50, 1.03) | 0.727 |
| Fiber intake: Q4 | 1.18 (0.54, 2.55) | 0.921 | 0.66 (0.49, 0.88) | 0.641 |
| Caffeine intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Caffeine intake: Q2 | 0.94 (0.42, 2.14) | 0.942 | 0.84 (0.62, 1.13) | 0.892 |
| Caffeine intake: Q3 | 1.04 (0.52, 2.09) | 0.942 | 0.84 (0.59, 1.20) | 0.905 |
| Caffeine intake: Q4 | 1.38 (0.59, 3.25) | 0.909 | 0.90 (0.62, 1.29) | 0.921 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||
Multivariable regression analyses
Survey-weighted multivariable logistic regression models were used to assess the association between DII score and each disease outcome after adjustment for covariates of interest. Three models were generated. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI, physical activity, and smoking status. Results are shown in Figure 2 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension and in Figure 3 for eczema and food allergy.
Overall, there were few significant associations across the models. For allergic rhinitis, higher DII was associated with lower odds in Model 1 (OR 0.95, 95% CI 0.91–0.99; p=0.01), but this association was not statistically significant in Model 2 (OR 0.96, 95% CI 0.92–1.00; p=0.054) or Model 3 (OR 0.97, 95% CI 0.92–1.01; p=0.113). Among cardiometabolic outcomes, higher DII was associated with increased odds of hypertension in Model 1 (OR 1.07, 95% CI 1.03–1.12; p<0.001) and Model 2 (OR 1.06, 95% CI 1.01–1.10; p=0.011). However, this association was not statistically significant in Model 3 (OR 1.03, 95% CI 0.99–1.08; p=0.149). For secondary outcomes, higher DII was associated with decreased odds of food allergy only in Model 1 (OR 0.93, 95% CI 0.89–0.98; p=0.006).
# Adjusted multivariable models----------------------------------------------------------------
## Variables for models 1, 2, 3
covariates_m1 <- c("age", "sex")
covariates_m2 <- c("age", "sex", "race", "insur", "pir", "educ")
covariates_m3 <- c("age", "sex", "race", "insur", "pir", "educ", "bmi_cat", "phys_act", "smoking")
## Helper function to build regression formulas
make_dii_formula <- function(outcome_var, covariates) {
rhs <- c(covariates, "dii")
as.formula(
paste(outcome_var, "~", paste(rhs, collapse = " + "))
)
}
## Fit models for primary outcomes
fit_primary_models <- imap(
outcome_list,
~ list(
label = .x$label,
design = .x$design,
m1 = svyglm(
make_dii_formula(.x$var, covariates_m1),
design = .x$design,
family = quasibinomial()
),
m2 = svyglm(
make_dii_formula(.x$var, covariates_m2),
design = .x$design,
family = quasibinomial()
),
m3 = svyglm(
make_dii_formula(.x$var, covariates_m3),
design = .x$design,
family = quasibinomial()
)
)
)
## Fit models for secondary outcomes
fit_secondary_models <- imap(
outcome_list_secondary,
~ list(
label = .x$label,
design = .x$design,
m1 = svyglm(
make_dii_formula(.x$var, covariates_m1),
design = .x$design,
family = quasibinomial()
),
m2 = svyglm(
make_dii_formula(.x$var, covariates_m2),
design = .x$design,
family = quasibinomial()
),
m3 = svyglm(
make_dii_formula(.x$var, covariates_m3),
design = .x$design,
family = quasibinomial()
)
)
)
# Extract adjusted multivariable associations from models 1-3----------------------------------
extract_dii_from_model <- function(fit, outcome_label, model_label) {
td <- broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term == "dii")
if (nrow(td) == 0) {
return(tibble(
outcome_lab = outcome_label,
model = model_label,
OR = NA_real_,
LCL = NA_real_,
UCL = NA_real_,
p = NA_real_,
`OR (95% CI)` = "NE",
`p value` = NA_character_,
is_ne = TRUE
))
}
OR <- td$estimate[1]
LCL <- td$conf.low[1]
UCL <- td$conf.high[1]
p <- td$p.value[1]
is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
OR <= 0 | LCL <= 0 | UCL <= 0 |
is.na(OR) | is.na(LCL) | is.na(UCL) |
abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10
if (is_ne) {
or_ci_str <- "NE"
p_str <- NA_character_
} else {
or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
p_str <- scales::pvalue(p, accuracy = 0.001)
}
tibble(
outcome_lab = outcome_label,
model = model_label,
OR = ifelse(is_ne, NA_real_, OR),
LCL = ifelse(is_ne, NA_real_, LCL),
UCL = ifelse(is_ne, NA_real_, UCL),
p = ifelse(is_ne, NA_real_, p),
`OR (95% CI)` = or_ci_str,
`p value` = p_str,
is_ne = is_ne
)
}
## Primary outcomes in adjusted models 1–3
dii_primary_models_long <- map_dfr(
fit_primary_models,
function(m) {
bind_rows(
extract_dii_from_model(m$m1, m$label, "Model 1"),
extract_dii_from_model(m$m2, m$label, "Model 2"),
extract_dii_from_model(m$m3, m$label, "Model 3")
)
}
)
## Secondary outcomes in adjusted models 1–3
dii_secondary_models_long <- map_dfr(
fit_secondary_models,
function(m) {
bind_rows(
extract_dii_from_model(m$m1, m$label, "Model 1"),
extract_dii_from_model(m$m2, m$label, "Model 2"),
extract_dii_from_model(m$m3, m$label, "Model 3")
)
}
)
# Forest plots for multivariable associations--------------------------------------------------
## Helper function to construct label columns for OR and p values
add_or_p_labels <- function(df) {
df |>
mutate(
or_label = if_else(`OR (95% CI)` == "NE", "NE", `OR (95% CI)`),
p_label = if_else(
is.na(`p value`),
"",
paste0("p = ", `p value`)
)
)
}
## DII associations with primary outcomes
dii_primary_plot_data <- dii_primary_models_long |>
add_or_p_labels() |>
mutate(
outcome_lab = factor(
outcome_lab,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
),
# Adjust model levels (more-adjusted model on top)
model_label = factor(
model,
levels = c("Model 3", "Model 2", "Model 1")
)
)
## Compute axis limits
dii_primary_limits <- dii_primary_plot_data |>
filter(!is_ne) |>
summarise(
xmin = min(LCL, na.rm = TRUE),
xmax = max(UCL, na.rm = TRUE)
)
x_min_primary <- pmax(0.1, dii_primary_limits$xmin * 0.9)
x_max_primary <- pmin(2.5, dii_primary_limits$xmax * 1.1)
dii_primary_forest_plot <- dii_primary_plot_data |>
filter(!is_ne) |>
ggplot(
aes(
x = OR,
y = model_label
)
) +
geom_point() +
geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_text(
aes(
label = paste0(or_label, "\n", p_label),
x = pmin(UCL, x_max_primary * 0.95)
),
hjust = -0.05,
size = 3.5
) +
scale_x_continuous(
limits = c(x_min_primary, x_max_primary),
expand = expansion(mult = c(0.02, 0.1))
) +
facet_wrap(~ outcome_lab, ncol = 2) +
labs(
x = "Adjusted odds ratio for Dietary Inflammatory Index",
y = NULL,
title = "Figure 2: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, or Hypertension"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 13, hjust = 0.5),
strip.text = element_text(face = "bold", size = 11),
axis.text.y = element_text(face = "bold", size = 10),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(face = "bold", size = 11),
panel.spacing = grid::unit(0.4, "lines")
)
## DII associations with secondary outcomes
dii_secondary_plot_data <- dii_secondary_models_long |>
mutate(
outcome_lab = recode(
outcome_lab,
"Atopic dermatitis" = "Eczema"
)
) |>
add_or_p_labels() |>
mutate(
outcome_lab = factor(
outcome_lab,
levels = c("Eczema", "Food allergy")
),
model_label = factor(
model,
levels = c("Model 3", "Model 2", "Model 1")
)
)
dii_secondary_limits <- dii_secondary_plot_data |>
filter(!is_ne) |>
summarise(
xmin = min(LCL, na.rm = TRUE),
xmax = max(UCL, na.rm = TRUE)
)
x_min_secondary <- pmax(0.1, dii_secondary_limits$xmin * 0.9)
x_max_secondary <- pmin(2.5, dii_secondary_limits$xmax * 1.1)
dii_secondary_forest_plot <- dii_secondary_plot_data |>
filter(!is_ne) |>
ggplot(
aes(
x = OR,
y = model_label
)
) +
geom_point() +
geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_text(
aes(
label = paste0(or_label, "\n", p_label),
x = pmin(UCL, x_max_secondary * 0.95)
),
hjust = -0.05,
size = 3.5
) +
scale_x_continuous(
limits = c(x_min_secondary, x_max_secondary),
expand = expansion(mult = c(0.02, 0.1))
) +
facet_wrap(~ outcome_lab, ncol = 1) +
labs(
x = "Adjusted odds ratio for Dietary Inflammatory Index",
y = NULL,
title = "Figure 3: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Eczema or Food Allergy",
caption = "Note: Model 3 was not estimatable for eczema because of sparse outcome counts."
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 13, hjust = 0.5),
strip.text = element_text(face = "bold", size = 11),
axis.text.y = element_text(face = "bold", size = 10),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(face = "bold", size = 11),
panel.spacing = grid::unit(0.4, "lines")
)
# Print forest plots---------------------------------------------------------------------------
dii_primary_forest_plotdii_secondary_forest_plotEffect modification analyses
Effect modification analyses were performed to assess whether the association between DII and each disease outcome differed by sex, BMI category, or vitamin D status. Results are shown in Table 9 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension, and in Table 10 for eczema and food allergy. Overall, only a few interactions were statistically significant and generally without a clear trend across categories. For inflammatory arthritis, BMI significantly modified the association (p<0.001), but without a clear trend across BMI categories. For diabetes, vitamin D status significantly modified the association (p<0.001), with those with vitamin D deficiency exhibiting a higher association between DII and diabetes (OR 1.70, 95% CI 1.13–2.57). However, those with vitamin D insufficiency exhibited a reduced association (OR 0.81, 95% CI 0.65–0.99). For secondary outcomes, eczema demonstrated a significant interaction with BMI (p=0.012), with an increased association seen among those who were underweight (OR 2.20, 95% CI 1.25–3.88). No significant interaction terms were observed for sex.
# Effect modification analyses-----------------------------------------------------------------
effectmod_modifiers <- tibble(
modifier_var = c("sex", "bmi_cat", "vitD25oh_cat"),
modifier_label = c("Sex", "BMI category", "Vitamin D category")
)
fit_dii_interaction_model <- function(outcome_info, covariates_base, modifier_var) {
outcome_var <- outcome_info$var
base_covars <- covariates_base
if (!(modifier_var %in% base_covars)) {
base_covars <- c(base_covars, modifier_var)
}
rhs_terms <- c("dii", modifier_var, paste0("dii:", modifier_var), base_covars)
rhs_terms <- unique(rhs_terms)
f <- as.formula(
paste(
outcome_var,
"~",
paste(rhs_terms, collapse = " + ")
)
)
svyglm(
f,
design = outcome_info$design,
family = quasibinomial()
)
}
run_dii_effectmod <- function(fit, outcome_label, modifier_var, modifier_label) {
cf <- coef(fit)
V <- vcov(fit)
mf <- model.frame(fit)
mod <- mf[[modifier_var]]
if (!is.factor(mod)) {
mod <- factor(mod)
}
levels_mod <- levels(mod)
int_pattern1 <- paste0("^dii:", modifier_var)
int_pattern2 <- paste0("^", modifier_var, ".*:dii")
interaction_terms <- names(cf)[
grepl(int_pattern1, names(cf)) | grepl(int_pattern2, names(cf))
]
if (length(interaction_terms) > 0) {
beta_int <- cf[interaction_terms]
V_int <- V[interaction_terms, interaction_terms, drop = FALSE]
stat <- as.numeric(t(beta_int) %*% solve(V_int) %*% beta_int)
df <- length(beta_int)
p_int <- 1 - pchisq(stat, df = df)
} else {
p_int <- NA_real_
}
map_dfr(
seq_along(levels_mod),
function(j) {
lvl <- levels_mod[j]
if (!("dii" %in% names(cf))) {
return(tibble())
}
beta_dii <- cf[["dii"]]
var_dii <- V["dii", "dii"]
if (j == 1L) {
beta <- beta_dii
var <- var_dii
} else {
int_name_candidates <- c(
paste0("dii:", modifier_var, lvl),
paste0(modifier_var, lvl, ":dii")
)
int_name <- int_name_candidates[int_name_candidates %in% names(cf)]
if (length(int_name) == 0L) {
return(tibble())
}
int_name <- int_name[1]
beta_int <- cf[[int_name]]
var_int <- V[int_name, int_name]
covar <- V["dii", int_name]
beta <- beta_dii + beta_int
var <- var_dii + var_int + 2 * covar
}
se <- sqrt(var)
OR <- exp(beta)
LCL <- exp(beta - 1.96 * se)
UCL <- exp(beta + 1.96 * se)
is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
OR <= 0 | LCL <= 0 | UCL <= 0 |
is.na(OR) | is.na(LCL) | is.na(UCL) |
abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10
if (is_ne) {
or_ci_str <- "NE"
} else {
or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
}
tibble(
outcome_lab = outcome_label,
modifier_var = modifier_var,
modifier_label = modifier_label,
level = lvl,
OR = ifelse(is_ne, NA_real_, OR),
LCL = ifelse(is_ne, NA_real_, LCL),
UCL = ifelse(is_ne, NA_real_, UCL),
is_ne = is_ne,
`OR (95% CI)` = or_ci_str,
p_int = p_int
)
}
)
}
## Level order for BMI and Vitamin D
bmi_level_order <- c(
"Underweight (<18.5)",
"Normal (18.5–24.9)",
"Overweight (25–29.9)",
"Obesity I (30–34.9)",
"Obesity II (35–39.9)",
"Obesity III (≥40)"
)
vitD_level_order <- c(
"<20 (deficient)",
"20–29 (insufficient)",
"≥30 (sufficient)"
)
## Helper function to build display table
build_effectmod_display <- function(df) {
df |>
arrange(outcome_lab, modifier_var, level) |>
group_by(outcome_lab, modifier_var, modifier_label) |>
group_split() |>
map_dfr(function(grp) {
out_name <- grp$outcome_lab[[1]]
mod_lab <- grp$modifier_label[[1]]
### Fix modifier text
mod_label_clean <- case_when(
mod_lab == "BMI category" ~ "by BMI category",
mod_lab == "Vitamin D category" ~ "by Vitamin D category",
TRUE ~ paste0("by ", tolower(mod_lab))
)
p_int_vals <- grp$`p for interaction`[!is.na(grp$`p for interaction`)]
p_int_disp <- ifelse(length(p_int_vals) == 0, NA_character_, p_int_vals[1])
header_row <- tibble(
Outcome = out_name,
`Modifier level` = paste0(out_name, " ", mod_label_clean),
`OR (95% CI)` = "",
`p for interaction` = p_int_disp
)
level_rows <- grp |>
transmute(
Outcome = out_name,
`Modifier level` = as.character(level),
`OR (95% CI)` = `OR (95% CI)`,
`p for interaction` = ""
)
bind_rows(header_row, level_rows)
})
}
## Effect modification for primary outcomes
effectmod_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
outcome_info <- outcome_list[[out_nm]]
map_dfr(
seq_len(nrow(effectmod_modifiers)),
function(i) {
mod_var <- effectmod_modifiers$modifier_var[i]
mod_label <- effectmod_modifiers$modifier_label[i]
fit_int <- fit_dii_interaction_model(
outcome_info = outcome_info,
covariates_base = covariates_m3,
modifier_var = mod_var
)
run_dii_effectmod(
fit = fit_int,
outcome_label = outcome_info$label,
modifier_var = mod_var,
modifier_label = mod_label
)
}
)
}
) |>
mutate(
`p for interaction` = ifelse(
is.na(p_int),
NA_character_,
scales::pvalue(p_int, accuracy = 0.001)
)
)
effectmod_primary_long <- effectmod_primary_long |>
mutate(
modifier_var = factor(
modifier_var,
levels = c("sex", "bmi_cat", "vitD25oh_cat")
),
level = as.character(level)
) |>
mutate(
level = case_when(
modifier_var == "bmi_cat" ~ factor(level, levels = bmi_level_order),
modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
TRUE ~ factor(level)
),
### Outcome order for primary outcomes
outcome_lab = factor(
outcome_lab,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
)
) |>
arrange(
outcome_lab,
modifier_var,
level
)
effectmod_primary_display <- build_effectmod_display(effectmod_primary_long)
tbl_effectmod_primary <- effectmod_primary_display |>
select(
Outcome,
`Modifier level`,
`OR (95% CI)`,
`p for interaction`
) |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**")
) |>
fmt_markdown(columns = everything()) |>
cols_label(
`Modifier level` = md("**Modifier level**"),
`OR (95% CI)` = md("**OR (95% CI)**"),
`p for interaction` = md("**p for interaction**")
) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) |>
tab_footnote(
footnote = "NE = not estimatable due to low event rates or unstable estimates",
locations = cells_body(
columns = "OR (95% CI)",
rows = `OR (95% CI)` == "NE"
)
)
## Effect modification for secondary outcomes
effectmod_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
outcome_info <- outcome_list_secondary[[out_nm]]
map_dfr(
seq_len(nrow(effectmod_modifiers)),
function(i) {
mod_var <- effectmod_modifiers$modifier_var[i]
mod_label <- effectmod_modifiers$modifier_label[i]
fit_int <- fit_dii_interaction_model(
outcome_info = outcome_info,
covariates_base = covariates_m3,
modifier_var = mod_var
)
run_dii_effectmod(
fit = fit_int,
outcome_label = outcome_info$label,
modifier_var = mod_var,
modifier_label = mod_label
)
}
)
}
) |>
mutate(
outcome_lab = recode(
outcome_lab,
"Atopic dermatitis" = "Eczema"
),
`p for interaction` = ifelse(
is.na(p_int),
NA_character_,
scales::pvalue(p_int, accuracy = 0.001)
)
)
effectmod_secondary_long <- effectmod_secondary_long |>
mutate(
modifier_var = factor(
modifier_var,
levels = c("sex", "bmi_cat", "vitD25oh_cat")
),
level = as.character(level)
) |>
mutate(
level = case_when(
modifier_var == "bmi_cat" ~ factor(level, levels = bmi_level_order),
modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
TRUE ~ factor(level)
),
## Enforce outcome order for secondary outcomes:
## Eczema, Food allergy
outcome_lab = factor(
outcome_lab,
levels = c(
"Eczema",
"Food allergy"
)
)
) |>
arrange(
outcome_lab,
modifier_var,
level
)
effectmod_secondary_display <- build_effectmod_display(effectmod_secondary_long)
tbl_effectmod_secondary <- effectmod_secondary_display |>
select(
Outcome,
`Modifier level`,
`OR (95% CI)`,
`p for interaction`
) |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy**")
) |>
fmt_markdown(columns = everything()) |>
cols_label(
`Modifier level` = md("**Modifier level**"),
`OR (95% CI)` = md("**OR (95% CI)**"),
`p for interaction` = md("**p for interaction**")
) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) |>
tab_footnote(
footnote = "NE = not estimatable due to low event rates or unstable estimates",
locations = cells_body(
columns = "OR (95% CI)",
rows = `OR (95% CI)` == "NE"
)
)
# Print effect modification tables-------------------------------------------------------------
tbl_effectmod_primary| Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||
|---|---|---|
| Modifier level | OR (95% CI) | p for interaction |
| Asthma | ||
| Asthma by sex | 0.072 | |
| Female | 0.95 (0.91, 1.00) | |
| Male | 1.01 (0.96, 1.06) | |
| Asthma by BMI category | 0.808 | |
| Underweight (<18.5) | 1.04 (0.73, 1.48) | |
| Normal (18.5–24.9) | 0.97 (0.91, 1.03) | |
| Overweight (25–29.9) | 0.99 (0.92, 1.07) | |
| Obesity I (30–34.9) | 0.96 (0.89, 1.04) | |
| Obesity II (35–39.9) | 1.03 (0.93, 1.14) | |
| Obesity III (≥40) | 0.97 (0.87, 1.09) | |
| Asthma by Vitamin D category | 0.842 | |
| <20 (deficient) | 1.00 (0.79, 1.26) | |
| 20–29 (insufficient) | 0.94 (0.81, 1.09) | |
| ≥30 (sufficient) | 0.97 (0.93, 1.02) | |
| Allergic rhinitis | ||
| Allergic rhinitis by sex | 0.337 | |
| Female | 0.95 (0.89, 1.00) | |
| Male | 0.99 (0.93, 1.05) | |
| Allergic rhinitis by BMI category | 0.335 | |
| Underweight (<18.5) | 0.85 (0.62, 1.18) | |
| Normal (18.5–24.9) | 0.96 (0.91, 1.02) | |
| Overweight (25–29.9) | 0.98 (0.91, 1.06) | |
| Obesity I (30–34.9) | 1.00 (0.92, 1.08) | |
| Obesity II (35–39.9) | 1.00 (0.87, 1.16) | |
| Obesity III (≥40) | 0.81 (0.67, 0.97) | |
| Allergic rhinitis by Vitamin D category | 0.861 | |
| <20 (deficient) | 1.02 (0.83, 1.25) | |
| 20–29 (insufficient) | 0.97 (0.79, 1.18) | |
| ≥30 (sufficient) | 0.96 (0.92, 1.01) | |
| Inflammatory arthritis | ||
| Inflammatory arthritis by sex | 0.263 | |
| Female | 1.02 (0.88, 1.18) | |
| Male | 1.15 (0.94, 1.40) | |
| Inflammatory arthritis by BMI category | <0.001 | |
| Underweight (<18.5) | NE1 | |
| Normal (18.5–24.9) | 1.04 (0.89, 1.22) | |
| Overweight (25–29.9) | 1.02 (0.87, 1.18) | |
| Obesity I (30–34.9) | 1.10 (0.78, 1.55) | |
| Obesity II (35–39.9) | 1.10 (0.93, 1.31) | |
| Obesity III (≥40) | 1.12 (0.69, 1.84) | |
| Inflammatory arthritis by Vitamin D category | 0.603 | |
| <20 (deficient) | 0.72 (0.23, 2.25) | |
| 20–29 (insufficient) | 0.93 (0.72, 1.21) | |
| ≥30 (sufficient) | 1.08 (0.92, 1.26) | |
| Diabetes | ||
| Diabetes by sex | 0.641 | |
| Female | 1.04 (0.91, 1.19) | |
| Male | 0.99 (0.85, 1.15) | |
| Diabetes by BMI category | 0.994 | |
| Underweight (<18.5) | 0.98 (0.87, 1.10) | |
| Normal (18.5–24.9) | 1.04 (0.72, 1.49) | |
| Overweight (25–29.9) | 1.01 (0.84, 1.21) | |
| Obesity I (30–34.9) | 1.03 (0.83, 1.27) | |
| Obesity II (35–39.9) | 0.98 (0.82, 1.16) | |
| Obesity III (≥40) | 1.01 (0.87, 1.19) | |
| Diabetes by Vitamin D category | <0.001 | |
| <20 (deficient) | 1.70 (1.13, 2.57) | |
| 20–29 (insufficient) | 0.81 (0.65, 0.99) | |
| ≥30 (sufficient) | 1.05 (0.93, 1.18) | |
| Hypertension | ||
| Hypertension by sex | 0.285 | |
| Female | 1.06 (0.99, 1.15) | |
| Male | 1.01 (0.96, 1.07) | |
| Hypertension by BMI category | 0.135 | |
| Underweight (<18.5) | 1.08 (0.82, 1.42) | |
| Normal (18.5–24.9) | 1.07 (0.98, 1.16) | |
| Overweight (25–29.9) | 1.01 (0.94, 1.09) | |
| Obesity I (30–34.9) | 1.10 (1.01, 1.20) | |
| Obesity II (35–39.9) | 0.97 (0.87, 1.08) | |
| Obesity III (≥40) | 0.98 (0.87, 1.09) | |
| Hypertension by Vitamin D category | 0.047 | |
| <20 (deficient) | 0.82 (0.64, 1.05) | |
| 20–29 (insufficient) | 0.94 (0.82, 1.07) | |
| ≥30 (sufficient) | 1.05 (1.01, 1.10) | |
| 1 NE = not estimatable due to low event rates or unstable estimates | ||
tbl_effectmod_secondary| Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy | ||
|---|---|---|
| Modifier level | OR (95% CI) | p for interaction |
| Eczema | ||
| Eczema by sex | 0.341 | |
| Female | 1.02 (0.83, 1.26) | |
| Male | 0.90 (0.76, 1.06) | |
| Eczema by BMI category | 0.012 | |
| Underweight (<18.5) | 2.20 (1.25, 3.88) | |
| Normal (18.5–24.9) | 0.95 (0.76, 1.17) | |
| Overweight (25–29.9) | 1.04 (0.83, 1.30) | |
| Obesity I (30–34.9) | 0.86 (0.69, 1.08) | |
| Obesity II (35–39.9) | 1.24 (0.73, 2.10) | |
| Obesity III (≥40) | 0.86 (0.60, 1.23) | |
| Eczema by Vitamin D category | 0.184 | |
| <20 (deficient) | 0.57 (0.21, 1.54) | |
| 20–29 (insufficient) | 1.22 (0.79, 1.89) | |
| ≥30 (sufficient) | 0.96 (0.85, 1.09) | |
| Food allergy | ||
| Food allergy by sex | 0.322 | |
| Female | 0.93 (0.87, 1.00) | |
| Male | 0.98 (0.89, 1.08) | |
| Food allergy by BMI category | 0.988 | |
| Underweight (<18.5) | 0.75 (0.36, 1.56) | |
| Normal (18.5–24.9) | 0.96 (0.87, 1.07) | |
| Overweight (25–29.9) | 0.95 (0.85, 1.07) | |
| Obesity I (30–34.9) | 0.95 (0.83, 1.09) | |
| Obesity II (35–39.9) | 0.98 (0.81, 1.17) | |
| Obesity III (≥40) | 0.93 (0.78, 1.11) | |
| Food allergy by Vitamin D category | 0.313 | |
| <20 (deficient) | 0.80 (0.64, 1.00) | |
| 20–29 (insufficient) | 0.97 (0.84, 1.14) | |
| ≥30 (sufficient) | 0.95 (0.88, 1.02) | |
Machine learning analyses
Exploratory machine learning models were developed to identify predictors of disease beyond those detected in traditional regression analyses. For each outcome, four algorithms – namely LR, LASSO, RF, and XGB – were trained on one of two predictor sets: (1) core demographic, psychosocial, clinical, and dietary covariates, including the total DII score, and (2) quartiles of individual nutrient intakes. ROC curves (Figures 4-5) and AUROC values (Tables 11–12) summarize model performance for models with either core covariates or nutrients only, respectively. Diabetes demonstrated the strongest performance (AUROC 0.83–0.92), followed by hypertension (AUROC 0.72–0.76) and inflammatory arthritis (AUROC 0.64–0.76). Performance was more modest for allergic rhinitis (AUROC 0.62–0.66) and asthma (AUROC 0.61–0.62), and was notably lower for food allergy (AUROC 0.49–0.59) and eczema (AUROC 0.44–0.60). Across all outcomes, models trained using nutrient-intake quartiles showed minimal discriminative ability (AUROC 0.46–0.65).
Feature importance analyses were performed to acquire insight into potentially important predictors related to disease outcomes. In analyses with core covariate predictors (Figures 6–12), the DII score consistently ranked among the top 10 predictors for every disease outcome in at last three out of four models per disease outcome. For allergic and immunologic outcomes, DII scores, cardiometabolic factors, and psychosocial variables were influential predictors. For diabetes and hypertension, influential predictors included elevated HbA1c, elevated lipid markers, and adiposity. In contrast to the core covariate models, nutrient-only models (Figures 16–19) demonstrated inconsistent patterns across diseases. Nutrients such as polyunsaturated fatty acids (PUFAs), omega-6 fatty accids, caffeine, total caloric intake, iron, and B vitamins were observed but without a clear disease-specific consistency.
# Model run options, outcome definitions, and helper functions---------------------------------
options(yardstick.event_first = "second") # second outcome is disease
## Trial mode. TRUE = abbreviated, trial run (3-fold CV, smaller grids) for exploratory analysis; FALSE = full run (10-fold CV, larger grids) for final analysis
trial_mode <- FALSE
trial_max_n <- 2000 # maximum complete cases per outcome for trial runs
tidymodels_prefer()
ml_outcomes <- list(
asthma = list(
label = "Asthma",
data = asthma_reg,
outcome = "asthma",
is_primary = TRUE
),
ar = list(
label = "Allergic rhinitis",
data = ar_reg,
outcome = "ar",
is_primary = TRUE
),
arthritis = list(
label = "Inflammatory arthritis",
data = inflam_reg,
outcome = "arthritis",
is_primary = TRUE
),
htn = list(
label = "Hypertension",
data = htn_reg,
outcome = "htn",
is_primary = TRUE
),
diabetes = list(
label = "Diabetes",
data = dm_reg,
outcome = "diabetes",
is_primary = TRUE
),
ad = list(
label = "Atopic dermatitis",
data = ad_reg,
outcome = "ad",
is_primary = FALSE
),
fa = list(
label = "Food allergy",
data = fa_reg,
outcome = "fa",
is_primary = FALSE
)
)
## Helper function to add trend variables
add_ml_trend_vars <- function(data) {
### Vitamin D trend
if ("vitD25oh_cat" %in% names(data) && !"vitD25oh_cat_trend" %in% names(data)) {
data <- data |>
mutate(
vitD25oh_cat_trend = if_else(
!is.na(vitD25oh_cat),
as.numeric(vitD25oh_cat) - 1,
NA_real_
)
)
}
### DII quartile trend
if ("dii_quart" %in% names(data) && !"dii_quart_trend" %in% names(data)) {
data <- data |>
mutate(
dii_quart_trend = if_else(
!is.na(dii_quart),
as.numeric(dii_quart) - 1,
NA_real_
)
)
}
data
}
# Predictor sets-------------------------------------------------------------------------------
# Set 1 variabvles: core covariates
predictors_set1 <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"vitD25oh_cat_trend",
"dii",
"dii_quart_trend",
"dii_etoh"
)
# Set 2 variables: individual nutrient quartiles
predictors_set2 <- c(
"kcal_quart", "carb_quart", "protein_quart", "totalfat_quart", "satfat_quart",
"pufa_quart", "mufa_quart", "n3fat_quart", "n6fat_quart", "choles_quart",
"vitA_quart", "bcarotene_quart", "vitC_quart", "vitE_quart",
"vitB6_quart", "vitB12_quart", "folicacid_quart",
"thiamin_quart", "riboflavin_quart", "niacin_quart",
"iron_quart", "mg_quart", "zn_quart", "se_quart",
"fiber_quart", "caffeine_quart"
)
# Helper functions-----------------------------------------------------------------------------
## Helper function to prepare ML data
prepare_ml_data <- function(data, outcome, predictors) {
### Add trend scores
data <- add_ml_trend_vars(data)
### Keep only outcome and predictors
vars_keep <- intersect(c(outcome, predictors), names(data))
df <- data |>
select(all_of(vars_keep)) |>
drop_na() |>
mutate(
!!outcome := factor(
.data[[outcome]],
levels = c(0, 1),
labels = c("No", "Yes")
)
)
### For trial runs: sub-sample if enough events and keep all rows for rare outcomes
if (trial_mode && nrow(df) > trial_max_n) {
case_n <- sum(df[[outcome]] == "Yes")
min_events_for_subsample <- 150L
if (case_n >= min_events_for_subsample) {
df <- slice_sample(df, n = trial_max_n)
}
}
df
}
## Helper function to train/test-split data and perform cross-validation
set.seed(1234)
make_splits_and_folds <- function(df, outcome) {
v_folds <- if (trial_mode) 3L else 10L
split_obj <- initial_split(
df,
strata = !!rlang::sym(outcome),
prop = 0.8
)
train_df <- training(split_obj)
test_df <- testing(split_obj)
folds <- vfold_cv(
train_df,
v = v_folds,
strata = !!rlang::sym(outcome)
)
list(
split = split_obj,
train = train_df,
test = test_df,
folds = folds
)
}
## Helper function for recipes
make_ml_recipe <- function(df, outcome, predictors) {
recipe(
formula = stats::as.formula(paste(outcome, "~", ".")),
data = df
) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_predictors())
}
# Model specifications-------------------------------------------------------------------------
lr_spec <-
logistic_reg() |>
set_engine("glm") |>
set_mode("classification")
lasso_spec <-
logistic_reg(
penalty = tune(),
mixture = 1
) |>
set_engine("glmnet") |>
set_mode("classification")
rf_spec_base <-
rand_forest(
trees = if (trial_mode) 500L else 1000L,
mtry = tune(),
min_n = 5L
) |>
set_engine("randomForest", importance = TRUE) |>
set_mode("classification")
xgb_spec_base <-
boost_tree(
trees = tune(),
tree_depth = tune(),
learn_rate = tune(),
min_n = 5L,
sample_size = 1.0,
loss_reduction = 0
) |>
set_mode("classification") |>
set_engine("xgboost")
## Helper function for logistic regression metrics and variable importance
get_lr_vi <- function(lr_fit_workflow) {
glm_fit <- extract_fit_engine(lr_fit_workflow)
broom::tidy(glm_fit) |>
filter(term != "(Intercept)") |>
transmute(
Variable = term,
Importance = abs(estimate),
model = "Logistic regression"
)
}
# Machine learning pipeline -------------------------------------------------------------------
run_ml_pipeline <- function(predictors, predictor_set_label) {
## Datasets and sample sizes
ml_data_list <- purrr::imap(
ml_outcomes,
~ prepare_ml_data(
data = .x$data,
outcome = .x$outcome,
predictors = predictors
)
)
ml_sample_sizes <- purrr::imap_dfr(
ml_data_list,
~ tibble(
outcome = .y,
outcome_label = ml_outcomes[[.y]]$label,
N_ml = nrow(.x),
predictor_set = predictor_set_label
)
)
ml_sample_sizes_gt <- ml_sample_sizes |>
gt() |>
tab_header(
title = paste0("ML analytic sample sizes by outcome — ", predictor_set_label)
) |>
cols_label(
outcome = "Outcome (internal name)",
outcome_label = "Outcome",
N_ml = "Complete-case N (ML)",
predictor_set = "Predictor set"
)
## Train/test splits and cross-validation folds
ml_resampling <- purrr::imap(
ml_data_list,
~ make_splits_and_folds(.x, outcome = ml_outcomes[[.y]]$outcome)
)
## Recipes and feature counts
ml_recipes <- purrr::imap(
ml_resampling,
~ make_ml_recipe(
df = .x$train,
outcome = ml_outcomes[[.y]]$outcome,
predictors = predictors
)
)
ml_feature_counts <- purrr::imap_dfr(
ml_recipes,
~ {
prep_rec <- prep(.x)
baked <- juice(prep_rec)
tibble(
outcome = .y,
outcome_label = ml_outcomes[[.y]]$label,
n_features = ncol(baked) - 1,
predictor_set = predictor_set_label
)
}
)
ml_feature_counts_gt <- ml_feature_counts |>
gt() |>
tab_header(
title = paste0("Number of ML predictors after preprocessing — ", predictor_set_label)
) |>
cols_label(
outcome = "Outcome (internal name)",
outcome_label = "Outcome",
n_features = "Number of predictors",
predictor_set = "Predictor set"
)
## Tuning grids
lasso_grid <- grid_regular(
penalty(),
levels = if (trial_mode) 5L else 30L
)
rf_grid <- grid_regular(
mtry(range = c(5L, min(30L, length(predictors)))),
levels = if (trial_mode) 3L else 5L
)
set.seed(1234)
xgb_grid <- grid_latin_hypercube(
trees(range = c(100L, 300L)),
tree_depth(range = c(3L, 5L)),
learn_rate(range = c(-2, -1)),
size = if (trial_mode) 10L else 20L
)
## Workflows per outcome
make_workflows_for_outcome <- function(recipe_obj, outcome_name, train_df) {
list(
lr = workflow() |>
add_model(lr_spec) |>
add_recipe(recipe_obj),
lasso = workflow() |>
add_model(lasso_spec) |>
add_recipe(recipe_obj),
rf = workflow() |>
add_model(rf_spec_base) |>
add_recipe(recipe_obj),
xgb = workflow() |>
add_model(xgb_spec_base) |>
add_recipe(recipe_obj)
)
}
ml_workflows <- purrr::imap(
ml_resampling,
~ make_workflows_for_outcome(
recipe_obj = ml_recipes[[.y]],
outcome_name = .y,
train_df = .x$train
)
)
ctrl_resamples <- control_grid(
save_pred = TRUE,
parallel_over = "resamples",
verbose = TRUE
)
## Model tuning and cross-validation for each outcome
tune_models_for_outcome <- function(outcome_name) {
resampling <- ml_resampling[[outcome_name]]
workflows_out <- ml_workflows[[outcome_name]]
lr_res <- workflows_out$lr |>
fit_resamples(
resamples = resampling$folds,
control = control_resamples(save_pred = TRUE),
metrics = metric_set(
roc_auc
)
)
lasso_res <- workflows_out$lasso |>
tune_grid(
resamples = resampling$folds,
grid = lasso_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
rf_res <- workflows_out$rf |>
tune_grid(
resamples = resampling$folds,
grid = rf_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
xgb_res <- workflows_out$xgb |>
tune_grid(
resamples = resampling$folds,
grid = xgb_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
list(
lr = lr_res,
lasso = lasso_res,
rf = rf_res,
xgb = xgb_res
)
}
set.seed(1234)
ml_tuning_results <- purrr::imap(
ml_outcomes,
~ tune_models_for_outcome(.y)
)
## Best hyperparameters
get_best_params <- function(tune_res) {
list(
lasso = select_best(tune_res$lasso, metric = "roc_auc"),
rf = select_best(tune_res$rf, metric = "roc_auc"),
xgb = select_best(tune_res$xgb, metric = "roc_auc")
)
}
ml_best_params <- purrr::imap(
ml_tuning_results,
~ get_best_params(.x)
)
## Final model fits and test set performance
fit_and_evaluate_final_models <- function(outcome_name) {
outcome_info <- ml_outcomes[[outcome_name]]
resampling <- ml_resampling[[outcome_name]]
workflows_out <- ml_workflows[[outcome_name]]
best_pars <- ml_best_params[[outcome_name]]
outcome_var <- outcome_info$outcome
train_df <- resampling$train
test_df <- resampling$test
### Logistic regression
final_lr_fit <- workflows_out$lr |>
fit(data = train_df)
lr_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_lr_fit, test_df),
predict(final_lr_fit, test_df, type = "prob")
)
lr_metrics <- roc_auc(
lr_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "Logistic regression",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### LASSO
final_lasso_wf <- workflows_out$lasso |>
finalize_workflow(best_pars$lasso)
final_lasso_fit <- final_lasso_wf |>
fit(data = train_df)
lasso_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_lasso_fit, test_df),
predict(final_lasso_fit, test_df, type = "prob")
)
lasso_metrics <- roc_auc(
lasso_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "LASSO",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### Random forest
final_rf_wf <- workflows_out$rf |>
finalize_workflow(best_pars$rf)
final_rf_fit <- final_rf_wf |>
fit(data = train_df)
rf_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_rf_fit, test_df),
predict(final_rf_fit, test_df, type = "prob")
)
rf_metrics <- roc_auc(
rf_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "Random forest",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### XGBoost
final_xgb_wf <- workflows_out$xgb |>
finalize_workflow(best_pars$xgb)
final_xgb_fit <- final_xgb_wf |>
fit(data = train_df)
xgb_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_xgb_fit, test_df),
predict(final_xgb_fit, test_df, type = "prob")
)
xgb_metrics <- roc_auc(
xgb_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "XGBoost",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
list(
metrics = bind_rows(
lr_metrics,
lasso_metrics,
rf_metrics,
xgb_metrics
),
preds = list(
lr = lr_preds_test,
lasso = lasso_preds_test,
rf = rf_preds_test,
xgb = xgb_preds_test
),
final_fits = list(
lr = final_lr_fit,
lasso = final_lasso_fit,
rf = final_rf_fit,
xgb = final_xgb_fit
)
)
}
ml_final_results <- purrr::imap(
ml_outcomes,
~ fit_and_evaluate_final_models(.y)
)
### Performance summary data
ml_performance_summary_long <- purrr::map_dfr(
names(ml_final_results),
~ ml_final_results[[.x]]$metrics
)
ml_performance_table <- ml_performance_summary_long |>
select(
predictor_set,
outcome_internal,
outcome,
model,
set,
.metric,
.estimate
) |>
pivot_wider(
id_cols = c(predictor_set, outcome_internal, outcome, model, set),
names_from = .metric,
values_from = .estimate
) |>
rename(
AUROC = roc_auc
) |>
arrange(outcome, desc(AUROC))
ml_performance_table_gt <- ml_performance_table |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = paste0("Test-set performance of ML models by outcome — ",
predictor_set_label)
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
predictor_set = "Predictor set",
outcome_internal = "Outcome (internal)",
outcome = "Outcome",
set = "Predictor usage",
AUROC = "AUROC"
)
### Variable importance data
get_vi_all_models_for_outcome <- function(outcome_name) {
fits <- ml_final_results[[outcome_name]]$final_fits
lr_vi <- get_lr_vi(fits$lr)
lasso_vi <- vip::vi(fits$lasso) |>
mutate(
model = "LASSO",
Variable = as.character(Variable)
)
rf_vi <- vip::vi(fits$rf) |>
mutate(
model = "Random forest",
Variable = as.character(Variable)
)
xgb_vi <- vip::vi(fits$xgb) |>
mutate(
model = "XGBoost",
Variable = as.character(Variable)
)
bind_rows(lr_vi, lasso_vi, rf_vi, xgb_vi)
}
ml_vi_all <- purrr::imap(
ml_outcomes,
~ get_vi_all_models_for_outcome(.y)
)
### Pipeline outputs
list(
predictor_set_label = predictor_set_label,
predictors = predictors,
ml_sample_sizes = ml_sample_sizes,
ml_sample_sizes_gt = ml_sample_sizes_gt,
ml_feature_counts = ml_feature_counts,
ml_feature_counts_gt = ml_feature_counts_gt,
ml_data_list = ml_data_list,
ml_resampling = ml_resampling,
ml_recipes = ml_recipes,
ml_tuning_results = ml_tuning_results,
ml_best_params = ml_best_params,
ml_final_results = ml_final_results,
ml_performance_table = ml_performance_table,
ml_performance_table_gt = ml_performance_table_gt,
ml_vi_all = ml_vi_all
)
}
# Run machine learning pipelines for set 1 (core covariates) and set 2 (individual nutritional covariates)------------------------------------------------------------------------------------
results_set1 <- run_ml_pipeline(
predictors = predictors_set1,
predictor_set_label = "Set 1: main regression predictors"
)
results_set2 <- run_ml_pipeline(
predictors = predictors_set2,
predictor_set_label = "Set 2: nutrient quartiles"
)
ml_performance_all_stage1 <- bind_rows(
results_set1$ml_performance_table,
results_set2$ml_performance_table
)
# ROC curves-----------------------------------------------------------------------------------
## ROC curves for all outcomes by predictors
make_roc_panel <- function(results_obj, title_text) {
desired_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory\narthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
outcome_label_map <- c(
asthma = "Asthma",
ar = "Allergic rhinitis",
arthritis = "Inflammatory\narthritis",
diabetes = "Diabetes",
htn = "Hypertension",
ad = "Eczema",
fa = "Food allergy"
)
### Model color mapping
custom_colors <- c(
"LR" = "#E69F00", # logistic regression
"Lasso" = "#009E73", # LASSO
"RF" = "#0072B2", # random forest
"XGB" = "#CC79A7" # XGBoost
)
outcome_names <- names(ml_outcomes)
roc_long <- purrr::map_dfr(
outcome_names,
function(out_nm) {
preds <- results_obj$ml_final_results[[out_nm]]$preds
outcome_label <- outcome_label_map[[out_nm]]
bind_rows(
roc_curve(preds$lr, truth, .pred_Yes, event_level = "second") |>
mutate(model = "LR"),
roc_curve(preds$lasso, truth, .pred_Yes, event_level = "second") |>
mutate(model = "Lasso"),
roc_curve(preds$rf, truth, .pred_Yes, event_level = "second") |>
mutate(model = "RF"),
roc_curve(preds$xgb, truth, .pred_Yes, event_level = "second") |>
mutate(model = "XGB")
) |>
mutate(
outcome = factor(outcome_label, levels = desired_order)
)
}
)
ggplot(
roc_long,
aes(x = 1 - specificity, y = sensitivity, color = model)
) +
geom_line(linewidth = 0.9) +
geom_abline(linetype = "dashed") +
coord_equal() +
facet_wrap(~ outcome, ncol = 5, drop = FALSE) +
labs(
title = title_text,
x = "1 − Specificity",
y = "Sensitivity",
color = "Model"
) +
scale_color_manual(
values = custom_colors,
breaks = c("LR", "Lasso", "RF", "XGB")
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12),
strip.text = element_text(size = 12, face = "bold"),
panel.spacing.x = grid::unit(1.2, "lines"),
panel.spacing.y = grid::unit(1.2, "lines")
)
}
## ROC curves for models with core covariates
roc_panel_set1 <- make_roc_panel(
results_set1,
"Figure 4: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Core Covariates"
)
## ROC curves for models with individual nutritional covariates
roc_panel_set2 <- make_roc_panel(
results_set2,
"Figure 5: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Individual Nutritional Covariates"
)
# Machine learning model performance summary tables -------------------------------------------
## Desired outcome order
desired_outcome_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
## Core covariates performance summary (set 1)
core_perf <- results_set1$ml_performance_table |>
mutate(
outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
outcome = factor(outcome, levels = desired_outcome_order),
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
) |>
arrange(outcome, model)
core_perf_gt <- core_perf |>
select(
outcome,
model,
AUROC
) |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = md("**Table 11: Machine Learning Model Performance Summary for Models With Core Covariates**")
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
outcome = md("**Disease**"),
model = md("**Model**"),
AUROC = md("**AUROC**")
) |>
tab_options(
table.width = pct(70)
) |>
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_title("title"),
cells_column_labels(everything()),
cells_row_groups()
)
)
## Individual nutritional covariates performance summary (set 2)
nutr_perf <- results_set2$ml_performance_table |>
mutate(
outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
outcome = factor(outcome, levels = desired_outcome_order),
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
) |>
arrange(outcome, model)
nutr_perf_gt <- nutr_perf |>
select(
outcome,
model,
AUROC
) |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = md("**Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates**")
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
outcome = md("**Disease**"),
model = md("**Model**"),
AUROC = md("**AUROC**")
) |>
tab_options(
table.width = pct(70)
) |>
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_title("title"),
cells_column_labels(everything()),
cells_row_groups()
)
)
# Settings for ranked variable importance figures----------------------------------------------
## Disease-specific color map
disease_color_map <- c(
"Asthma" = "#5E3C99",
"Allergic Rhinitis" = "#89CFF0",
"Inflammatory Arthritis" = "#009E73",
"Diabetes" = "#F0E442",
"Hypertension" = "#D55E00",
"Eczema" = "#E78AC3",
"Food Allergy" = "#4B2E0F"
)
make_vi_plot_single <- function(results_obj,
outcome_name,
disease_title,
predictor_set_desc,
figure_label,
top_n = 10) {
vi_df <- results_obj$ml_vi_all[[outcome_name]]
vi_df <- vi_df |>
filter(!is.na(Importance), Importance != 0)
vi_df <- vi_df |>
mutate(
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
)
vi_plot_df <- vi_df |>
group_by(model) |>
slice_max(order_by = Importance, n = top_n, with_ties = FALSE) |>
arrange(Importance, .by_group = TRUE) |>
mutate(
Variable_plot = paste(model, Variable, sep = "__"),
Variable_plot = factor(Variable_plot, levels = unique(Variable_plot))
) |>
ungroup()
ggplot(
vi_plot_df,
aes(x = Importance, y = Variable_plot)
) +
geom_col(fill = disease_color_map[disease_title]) +
facet_wrap(~ model, scales = "free") +
scale_y_discrete(
labels = function(x) sub("^[^_]+__", "", x)
) +
labs(
title = paste0(
figure_label,
": Ranked Feature Importance of ",
predictor_set_desc,
"\nin Machine Learning Models for ",
disease_title
),
x = "Importance",
y = "Predictor"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
strip.text = element_text(size = 11, face = "bold"),
panel.spacing = grid::unit(1, "lines")
)
}
disease_title_map <- c(
asthma = "Asthma",
ar = "Allergic Rhinitis",
arthritis = "Inflammatory Arthritis",
diabetes = "Diabetes",
htn = "Hypertension",
ad = "Eczema",
fa = "Food Allergy"
)
core_figure_labels <- c(
asthma = "Figure 6",
ar = "Figure 7",
arthritis = "Figure 8",
diabetes = "Figure 9",
htn = "Figure 10",
ad = "Figure 11",
fa = "Figure 12"
)
nutr_figure_labels <- c(
asthma = "Figure 13",
ar = "Figure 14",
arthritis = "Figure 15",
diabetes = "Figure 16",
htn = "Figure 17",
ad = "Figure 18",
fa = "Figure 19"
)
# Generate core covariate (set 1) variable importance plots------------------------------------
vi_core_plots <- purrr::imap(
disease_title_map,
~ make_vi_plot_single(
results_obj = results_set1,
outcome_name = .y,
disease_title = .x,
predictor_set_desc = "Core Covariates",
figure_label = core_figure_labels[[.y]]
)
)
vi_core_asthma <- vi_core_plots$asthma # Figure 6
vi_core_ar <- vi_core_plots$ar # Figure 7
vi_core_arthritis <- vi_core_plots$arthritis # Figure 8
vi_core_diabetes <- vi_core_plots$diabetes # Figure 9
vi_core_htn <- vi_core_plots$htn # Figure 10
vi_core_eczema <- vi_core_plots$ad # Figure 11
vi_core_food_allergy <- vi_core_plots$fa # Figure 12
# Generate individual nutritional covariate (set 2) variable importance plots -----------------
vi_nutr_plots <- purrr::imap(
disease_title_map,
~ make_vi_plot_single(
results_obj = results_set2,
outcome_name = .y,
disease_title = .x,
predictor_set_desc = "Individual Nutritional Covariates",
figure_label = nutr_figure_labels[[.y]]
)
)
vi_nutr_asthma <- vi_nutr_plots$asthma # Figure 13
vi_nutr_ar <- vi_nutr_plots$ar # Figure 14
vi_nutr_arthritis <- vi_nutr_plots$arthritis # Figure 15
vi_nutr_diabetes <- vi_nutr_plots$diabetes # Figure 16
vi_nutr_htn <- vi_nutr_plots$htn # Figure 17
vi_nutr_eczema <- vi_nutr_plots$ad # Figure 18
vi_nutr_food_allergy <- vi_nutr_plots$fa # Figure 19
# Print ML tables and figures------------------------------------------------------------------
## Core covariates performance summary
core_perf_gt| Table 11: Machine Learning Model Performance Summary for Models With Core Covariates | |
|---|---|
| AUROC | |
| Asthma | |
| Logistic regression | 0.606 |
| LASSO | 0.600 |
| Random forest | 0.552 |
| XGBoost | 0.587 |
| Allergic rhinitis | |
| Logistic regression | 0.658 |
| LASSO | 0.687 |
| Random forest | 0.664 |
| XGBoost | 0.693 |
| Inflammatory arthritis | |
| Logistic regression | 0.566 |
| LASSO | 0.518 |
| Random forest | 0.453 |
| XGBoost | 0.437 |
| Diabetes | |
| Logistic regression | 0.816 |
| LASSO | 0.934 |
| Random forest | 0.877 |
| XGBoost | 0.828 |
| Hypertension | |
| Logistic regression | 0.786 |
| LASSO | 0.797 |
| Random forest | 0.729 |
| XGBoost | 0.713 |
| Eczema | |
| Logistic regression | 0.549 |
| LASSO | 0.563 |
| Random forest | 0.538 |
| XGBoost | 0.510 |
| Food allergy | |
| Logistic regression | 0.571 |
| LASSO | 0.515 |
| Random forest | 0.467 |
| XGBoost | 0.451 |
## Individual nutritional covariates performance summary
nutr_perf_gt| Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates | |
|---|---|
| AUROC | |
| Asthma | |
| Logistic regression | 0.529 |
| LASSO | 0.518 |
| Random forest | 0.503 |
| XGBoost | 0.507 |
| Allergic rhinitis | |
| Logistic regression | 0.552 |
| LASSO | 0.597 |
| Random forest | 0.548 |
| XGBoost | 0.577 |
| Inflammatory arthritis | |
| Logistic regression | 0.553 |
| LASSO | 0.550 |
| Random forest | 0.556 |
| XGBoost | 0.551 |
| Diabetes | |
| Logistic regression | 0.643 |
| LASSO | 0.636 |
| Random forest | 0.524 |
| XGBoost | 0.588 |
| Hypertension | |
| Logistic regression | 0.561 |
| LASSO | 0.558 |
| Random forest | 0.526 |
| XGBoost | 0.538 |
| Eczema | |
| Logistic regression | 0.469 |
| LASSO | 0.425 |
| Random forest | 0.457 |
| XGBoost | 0.406 |
| Food allergy | |
| Logistic regression | 0.532 |
| LASSO | 0.482 |
| Random forest | 0.520 |
| XGBoost | 0.515 |
## ROC panels
print(roc_panel_set1)print(roc_panel_set2)## Core covariates variable importance
print(vi_core_asthma)print(vi_core_ar)print(vi_core_arthritis)print(vi_core_diabetes)print(vi_core_htn)print(vi_core_eczema)print(vi_core_food_allergy)## Individual nutritional covariates variable importance
print(vi_nutr_asthma)print(vi_nutr_ar)print(vi_nutr_arthritis)print(vi_nutr_diabetes)print(vi_nutr_htn)print(vi_nutr_eczema)print(vi_nutr_food_allergy)5 Conclusion and Discussion
In this nationally representative cohort of U.S. adults aged 20–40 years, I evaluated whether dietary inflammatory content, measured via the Dietary Inflammatory Index, was associated with allergic, immunologic, and cardiometabolic outcomes. This was performed using both survey-weighted regression and exploratory machine learning analyses. In regression analyses, modest signals for increased risk of allergic, immunologic, and/or cardiometabolic disease and pro-inflammatory diet were detected, but these associations were largely attenuated after adjustment for sociodemographic, behavioral, and anthropometric factors. Nutrient-specific analyses identified a few potentially protective micro- and macronutrients, though most associations were modest and not consistently significant. Interestingly, in machine learning models, DII consistently ranked among the top predictors in models that included core covariates from multiple domains, although machine learning model findings should be interpreted with caution given they were not robust in their predictive performance. Overall while DII may not independently predict diseases, it likely plays an incremental role in driving these diseases.
This study has several strengths, including use of nationally representative NHANES data with implementation of survey weighting, inclusion of a validated dietary inflammatory score and nutrient composition database, and the use of complementary analytic approaches to capture both adjusted associations and potentially important, machine learning-derived predictors. However, several important limitations must be acknowledged. First, the study’s cross-sectional design precludes causal inference. Second, the accuracy of self-reported medical diagnoses and 24-hour dietary recalls may be inaccurate. Moreover, self-reported eczema and food allergy were only available in restricted NHANES cycles, a reflection of survey cycle heterogeneity and shifting survey priorities. As analyses relied on complete cases, the potential impact of missing data on the results is unclear. For the machine learning models, several outcomes yielded AUROC values near 0.5, indicating limited predictive performance ability. In addition, the cross-sectional nature of the study limited the ability of these models to capture temporal patterns.
Future directions for this work include examining broader age ranges and a wider set of candidate diseases to further characterize how dietary inflammatory potential relates to risk across the lifespan. Longitudinal cohort studies, in which dietary patterns and clinical outcomes are tracked over time, would offer a complementary approach for addressing these questions and could be paired with multi-omic data, such as immune and metabolic biomarkers and microbiome profiles. In addition, future machine learning models could focus on enhancing model performance by identifying a more predictive set of features beyond those considered in the current analysis. Continued study and refinement of disease risk models that integrate dietary information may reveal insights to inform prevention and management strategies for chronic immune-mediated and metabolic diseases.
6 References
[1] Annesi-Maesano I, Fleddermann M, Hornef M, et al. Allergic diseases in infancy: Epidemiology and current interpretation. World Allergy Organ J. 2021;14(11):100591.
[2] Boddu SK, Giannini C, Marcovecchio ML. Metabolic disorders in young people around the world. Diabetologia.2025;68(11):2374–2385.
[3] Kim H, Sitarik AR, Woodcroft K, Johnson CC, Zoratti E. Birth mode, breastfeeding, pet exposure, and antibiotics: Associations with the gut microbiome and sensitization. Curr Allergy Asthma Rep. 2019;19(4):22.
[4] Sbihi H, Boutin RC, Cutler C, Suen M, Finlay BB, Turvey SE. How early-life environmental exposures shape the gut microbiome and influence allergic disease. Allergy. 2019;74(11):2103–2115.
[5] Kuniyoshi Y, Tsujimoto Y, Banno M, et al. Association of obesity or metabolic syndrome with allergic diseases: Overview of reviews. Obes Rev. 2025;26(3):e13862.
[6] Chang CL, Ali GB, Pham J, et al. Childhood BMI trajectories and asthma and allergies: A systematic review. Clin Exp Allergy. 2023;53(9):911–929.
[7] Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data, 2005–2012. Hyattsville, MD: US Department of Health and Human Services. Available from: https://wwwn.cdc.gov/nchs/nhanes/
[8] Xu Z. Improving the dietaryindex R package: A proposal to include additional components for more accurate DII computation in NHANES. Am J Clin Nutr. 2025 Jan;121(1):174–175. Epub 2024 Nov 9.
[9] Zhan JJ, Hodge RA, Dunlop AL, Lee MM, Bui L, Liang D, Ferranti EP. Dietaryindex: a user-friendly and versatile R package for standardizing dietary pattern analysis in epidemiological and clinical studies. Am J Clin Nutr. 2024 Nov;120(5):1165–1174. Epub 2024 Aug 23.
[10] Qing L, Zhu Y, Yu C, Zhang Y, Ni J. Exploring the association between dietary Inflammatory Index and chronic pain in US adults using NHANES 1999-2004. Sci Rep. 2024 Apr 16;14(1):8726.
7 Acknowledgements
I would like to thank Drs. Robert Grundmeier and Rich Tsui for their guidance on the analytic and methodological approaches used in this study. I am also grateful to the BMIN 5030 professors and teaching assistants for helping me expand my foundation in data analysis throughout the course. In addition to consulting with my mentors, several references and tools were utilized to aid in completing this project, including prior course lecture slides, practicum exercises, and online troubleshooting resources including Stack Overflow, Posit Forum, and ChatGPT (OpenAI). Selective assistance from ChatGPT was used for debugging and developing helper functions to automate repetitive data processing tasks as I refined the cohort. All analytic decisions, data processing, modeling choices, and code were conceptualized, implemented, and reviewed by me with input from my co-mentors. I also wish to acknowledge the dietaryindexNDP package, which enabled calculation of DII scores that were key to this analysis.